29-Oct-2024
Most common
Others
Goal: Create workflow to trim and compress FASTQ files
Using a bash-script:
for input in *.fastq
do
sample=$(echo ${input} | sed 's/.fastq//')
# 1. Trim fastq file (trim 5 bp from left, 10 bp from right)
seqtk trimfq -b 5 -e 10 $input > ${sample}.trimmed.fastq
# 2. Compress fastq file
gzip -c ${sample}.trimmed.fastq > ${sample}.trimmed.fastq.gz
# 3. Remove intermediate files
rm ${sample}.trimmed.fastq
done
Using Snakemake rules:
$ snakemake -c 1 a.trimmed.fastq.gz b.trimmed.fastq.gz
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
2 gzip
2 trim_fastq
4
rule trim_fastq:
input: a.fastq
output: a.trimmed.fastq
wildcards: sample=a
1 of 4 steps (25%) done
rule gzip:
input: a.trimmed.fastq
output: a.trimmed.fastq.gz
wildcards: sample=a
Removing temporary output file a.trimmed.fastq.
2 of 4 steps (50%) done
rule trim_fastq:
input: b.fastq
output: b.trimmed.fastq
wildcards: sample=b
3 of 4 steps (75%) done
rule gzip:
input: b.trimmed.fastq
output: b.trimmed.fastq.gz
wildcards: sample=b
Removing temporary output file b.trimmed.fastq.
4 of 4 steps (100%) done
From the Snakemake documentation:
“A Snakemake workflow is defined by specifying rules in a Snakefile.”
“Rules decompose the workflow into small steps.”
“Snakemake automatically determines the dependencies between the rules by matching file names.”
$ snakemake -c 1 a.trimmed.fastq.gz b.trimmed.fastq.gz
$ snakemake -c 1 a.trimmed.fastq.gz
Example from the practical tutorial
make_supplementary
:$ snakemake -c 1 results/supplementary.html
make_supplementary
:$ touch results/bowtie2/NCTC8325.1.bt2
make_supplementary
:$ snakemake -c 1 results/supplementary.html
threads
directive specify maximum number of threads for a ruleresources
such as disk/memory requirements and runtimerule trim_fastq:
output: temp("{sample}.trimmed.fastq")
input: "{sample}.fastq"
log: "logs/{sample}.trim_fastq.log"
params:
leftTrim=5,
rightTrim=10
threads: 8
resources:
mem_mb=64,
runtime=120
shell:
"""
seqtk trimfq -t {threads} -b {params.leftTrim} -e {params.rightTrim} {input} > {output} 2> {log}
"""
threads
directive specify maximum number of threads for a ruleresources
such as disk/memory requirements and runtimeconda
or container
directiverule trim_fastq:
output: temp("{sample}.trimmed.fastq")
input: "{sample}.fastq"
log: "logs/{sample}.trim_fastq.log"
params:
leftTrim=5,
rightTrim=10
threads: 8
resources:
mem_mb=64,
runtime=120
conda: "envs/seqtk.yaml"
container: "docker://quay.io/biocontainers/seqtk"
shell:
"""
seqtk trimfq -t {threads} -b {params.leftTrim} -e {params.rightTrim} {input} > {output} 2> {log}
"""
threads
directive specify maximum number of threads for a ruleresources
such as disk/memory requirements and runtimeconda
or container
directiveenvs/seqtk.yaml
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html