Managing software environments with

29-Oct-2024

The problem

Full reproducibility requires the possibility to recreate the system that was originally used to generate the results.

Conda is a package, dependency, and environment manager

  • package: any type of program (e.g. multiqc, snakemake etc.)

flowchart LR
    multiqc(multiqc)

Conda is a package, dependency, and environment manager

  • package: any type of program (e.g. multiqc, snakemake etc.)
  • dependency: other software required by a package

flowchart LR
    multiqc(multiqc) -.-> numpy(numpy)
    multiqc -.-> matplotlib(matplotlib)
    multiqc -.-> python(python)

Conda is a package, dependency, and environment manager

  • package: any type of program (e.g. multiqc, snakemake etc.)
  • dependency: other software required by a package
    • dependencies in turn have their own dependencies

flowchart LR
    multiqc(multiqc) -.-> numpy(numpy)
    multiqc -.-> matplotlib(matplotlib)
    multiqc -.-> python(python)
    matplotlib -.-> python
    matplotlib -.-> numpy
    matplotlib -.-> fonttools(fonttools)
    numpy -.-> python
    numpy -.-> libcxx(libcxx)

Conda is a package, dependency, and environment manager

  • package: any type of program (e.g. multiqc, snakemake etc.)
  • dependency: other software required by a package
    • dependencies in turn have their own dependencies
  • environment: a distinct collection of packages

flowchart LR
    subgraph environment
    style environment fill:#00000000, stroke-width:1px
    direction LR
    multiqc(multiqc) -.-> numpy(numpy)
    multiqc -.-> matplotlib(matplotlib)
    multiqc -.-> python(python)
    matplotlib -.-> python
    matplotlib -.-> numpy
    matplotlib -.-> fonttools(fonttools)
    numpy-.->python
    numpy -.-> libcxx(libcxx)
    end

Conda channels

Channels are remote directories containing packages

flowchart TD
    ch1[(channel1)] --- p1[package1]
    ch1[(channel1)] --- p2[package2]
    ch1[(channel1)] --- p3[package3]

    ch2[(channel2)] --- p4[package4]
    ch2[(channel2)] --- p5[package5]
    ch2[(channel2)] --- p6[package6]

Conda channels

Two common examples are:

  • bioconda (a channel specializing in bioinformatics software)
  • conda-forge (a community-led channel made up of thousands of contributors)

flowchart TD
    ch1[(bioconda)] --- p1[bowtie2]
    ch1[(bioconda)] --- p2[fastqc]
    ch1[(bioconda)] --- p3[snakemake]

    ch2[(conda-forge)] --- p4[numpy]
    ch2[(conda-forge)] --- p5[jupyter]
    ch2[(conda-forge)] --- p6[wget]

Conda channels

Two common examples are:

  • bioconda (a channel specializing in bioinformatics software)
  • conda-forge (a community-led channel made up of thousands of contributors)

flowchart TD
    ch1[(bioconda)] --- p1[bowtie2]
    ch1[(bioconda)] --- p2[fastqc]
    ch1[(bioconda)] --- p3[snakemake]

    ch2[(conda-forge)] --- p4[numpy]
    ch2[(conda-forge)] --- p5[jupyter]
    ch2[(conda-forge)] --- p6[wget]

    p5 -.-> l1([conda install -c conda-forge -c bioconda snakemake jupyter])
    p3 -.-> l1

Defining and sharing environments

Define a Conda environment in an environment.yml file:

channels:
  - conda-forge
  - bioconda
dependencies:
  - fastqc=0.11
  - sra-tools=2.8
  - snakemake=4.3.0
  - multiqc=1.3
  - bowtie2=2.3
  - samtools=1.6
  - htseq=0.9
  - graphviz=2.38.0

Defining and sharing environments

  • Update an existing environment:
conda env update -f environment.yml

Defining and sharing environments

  • Update an existing environment:
conda env update -f environment.yml
  • Export environment (including all dependencies) to a file:
conda env export > environment.yml

Defining and sharing environments

  • Update an existing environment:
conda env update -f environment.yml
  • Export environment (including all dependencies) to a file:
conda env export > environment.yml
  • Export historical environment (only packages explicitly installed):
conda env export --from-history > environment.yml

Conda, Anaconda, Miniconda, Miniforge…

  • Conda: The package manager itself, written in python
  • Anaconda:
    • The company behind Conda itself
    • An installer for Conda containing over 7,500 open-source packages
    • A cloud service where conda packages are hosted (anaconda.com)
  • Miniconda: A minimal installer for Conda, pre-configured to use the default channel
  • Miniforge: A minimal installer for Conda, pre-configured to use the conda-forge channel (use this!)

Questions?