Making your work reproducible
This quickly becomes complex - not to mention tedious and boring. With larger sample sizes it becomes difficult to keep track which commands need updating should input data change.
Workflow managers!
process GATK4_MARKDUPLICATES {
input:
path bam
path fasta
path fasta_fai
output:
tuple val(meta), path("*bam"), emit: bam
tuple val(meta), path("*.bai"), emit: bai
tuple val(meta), path("*.metrics"), emit: metrics
script:
-- snip --
gatk MarkDuplicates $input_list ...
rule mark_duplicates:
input:
bam = "map/{prefix}.bam",
bai = "map/{prefix}.bai",
fasta = "ref/M_aurantiacus_v1.fasta"
output:
bam = "markdup/{prefix}.bam",
shell:
"gatk MarkDuplicates..."
A global community effort to collect a curated set of open‑source analysis pipelines built using Nextflow.
Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
snpArcher is a reproducible workflow optimized for nonmodel organisms and comparisons across datasets, built on the Snakemake workflow management system. It provides a streamlined approach to dataset acquisition, variant calling, quality control, and downstream analysis.
:::
:::
::::
Variant calling workflows