All of the tutorials and the material in them is dependent on the GitHub repository for the course. The first step of the setup is thus to download all the files that you will need, which is done differently depending on which operating system you have.
On the last day, in the Putting the pieces together session we will give examples of how we use these tools in our day-to-day work. During the course, spend some time thinking about how these tools could be useful for you in your own project(s). After you’ve gone through the tutorial you may feel that some of the tools could make your life easier from the get-go, while others may take some time to implement efficiently (and some you may never use again after the course). Our idea with the Putting the pieces together session is to have an open discussion about where to go from here.
1 Setup for Mac / Linux users
First, cd
into a directory on your computer (or create one) where it makes sense to download the course directory.
cd /path/to/your/directory
git clone https://github.com/NBISweden/workshop-reproducible-research.git
cd workshop-reproducible-research
If you want to revisit the material from an older instance of this course, you can do that using git switch -d tags/<tag-name>
, e.g. git switch -d tags/course_1905
. To list all available tags, use git tag
. Run this command after you have cd
into workshop-reproducible-research
as described above. If you do that, you probably also want to view the same older version of this website. Until spring 2021, the website was hosted at ReadTheDocs. Locate the version box in the bottom right corner of the website and select the corresponding version.
2 Setup for Windows users
Using a Windows computer for bioinformatic work has sadly not been ideal most of the time, but large advances in recent years have made this quite feasible through the Windows Subsystem for Linux (WSL). This is the only setup for Windows users that we allow for participants of this course, as all the material has been created and tested to work on Unix-based systems.
There are two substantially different versions of the Linux subsystem, WSL1 and WSL2. We strongly recommend using WSL2, which offers an essentially complete Linux experience and better performance.
Using the Linux subsystem will give you access to a full command-line bash shell and a Linux implementation on your Windows 10 or 11 PC. For the difference between the Linux Bash Shell and the Windows PowerShell, see e.g. this article.
Install WSL2 on Windows 10 or 11, follow the instructions at e.g. one of these resources:
- Installing the Windows Subsystem and the Linux Bash
- Installing and using Linux Bash on Windows
- Installing Linux Bash on Windows
If you run into error messages when trying to download files through the Linux shell (e.g. curl:(6) Could not resolve host
) then try adding the Google name server to the internet configuration by running sudo nano /etc/resolv.conf
then add nameserver 8.8.8.8
to the bottom of the file and save it.
Whenever a setup instruction specifies Mac or Linux (i.e. only those two, with no alternative for Windows), please follow the Linux instructions.
Open a bash shell Linux terminal and clone the GitHub repository containing all files you will need for completing the tutorials as follows. First, cd
into a directory on your computer (or create one) where it makes sense to download the course directory.
You can find the directory where the Linux distribution is storing all its files by typing explorer.exe .
. This will launch the Windows File Explorer showing the current Linux directory. Alternatively, you can find the Windows C drive from within the bash shell Linux terminal by navigating to /mnt/c/
.
cd /path/to/your/directory
git clone https://github.com/NBISweden/workshop-reproducible-research.git
cd workshop-reproducible-research
3 Installing Git
Chances are that you already have git installed on your computer. You can check by running e.g. git --version
. If you don’t have git, install it following the instructions here. If you have a very old version of git you might want to update to a later version. If you’re on a Mac you can also install it using Homebrew and simple brew install git
.
3.1 Configure git
If it is the first time you use git on your computer, you may want to configure it so that it is aware of your username and email. These should match those that you have registered on GitHub. This will make it easier when you want to sync local changes with your remote GitHub repository.
git config --global user.name "Mona Lisa"
git config --global user.email "mona_lisa@gmail.com"
If you have several accounts (e.g. both a GitHub and Bitbucket account), and thereby several different usernames, you can configure git on a per-repository level. Change directory into the relevant local git repository and run git config user.name "Mona Lisa"
. This will set the default username for that repository only.
You will also need to configure the default branch name to be main
instead of master
:
git config --global init.defaultBranch "main"
The short version of why you need to do this is that GitHub uses main
as the default branch while Git itself is still using master
; please read the box below for more information.
The default branch name for Git and many of the online resources for hosting Git repositories has traditionally been master
, which historically comes from the “master/slave” repositories of BitKeeper. This has been heavily discussed and in 2020 the decision was made by many (including GitHub) to start using main
instead. Any repository created with GitHub uses this new naming scheme since October of 2020, and Git itself is currently discussing implementing a similar change. Git did, however, introduce the ability to set the default branch name when using git init
in version 2.28, instead of using a hard-coded master
. We at NBIS want to be a part of this change, so we have chosen to use main
for this course.
3.2 GitHub setup
GitHub is one of several online hosting platforms for Git repositories. We’ll go through the details regarding how Git and GitHub are connected in the course itself, so for now we’ll stick to setting up your account and credentials.
If you have not done so already, go to github.com and create an account. You can also create an account on another online hosting service for version control, e.g. Bitbucket or GitLab. The exercises in this course are written with examples from GitHub (as that is the most popular platform with the most extensive features), but the same thing can be done on alternative services, although the exact menu structure and link placements differ.
Any upload to and from GitHub requires you to authenticate yourself. GitHub used to allow authentication with your account and password, but this is no longer the case - using SSH keys is required instead. Knowing exactly what these are is not necessary to get them working, but we encourage you to read the box below to learn more about them! GitHub has excellent, platform-specific instructions both on how to generate and add SSH keys to your account, so please follow those instructions.
Using SSH (Secure Shell) for authentication basically entails setting up a pair of keys: one private and one public. You keep the private key on your local computer and give the public key to anywhere you want to be able to connect to, e.g. GitHub. The public key can be used to encrypt messages that only the corresponding private key can decrypt. A simplified description of how SSH authentication works goes like this:
- The client (i.e. the local computer) sends the ID of the SSH key pair it would like to use for authentication to the server (e.g. GitHub)
- If that ID is found, the server generates a random number and encrypts this with the public key and sends it back to the client
- The client decrypts the random number with the private key and sends it back to the server
Notice that the private key always remains on the client’s side and is never transferred over the connection; the ability to decrypt messages encrypted with the public key is enough to ascertain the client’s authenticity. This is in contrast with using passwords, which are themselves sent across a connection (albeit encrypted). It is also important to note that even though the keys come in pairs it is impossible to derive the private key from the public key. If you want to read more details about how SSH authentication work you can check out this website, which has more in-depth information than we provide here.
4 Installing Conda
Conda is installed with a Miniforge installer specific for your operating system:
# Install Miniforge for 64-bit Mac
curl -L https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-x86_64.sh -O
bash Miniforge3-MacOSX-x86_64.sh
rm Miniforge3-MacOSX-x86_64.sh
# Install Miniforge for 64-bit Mac (Apple chip)
curl -L https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh -O
bash Miniforge3-MacOSX-arm64.sh
rm Miniforge3-MacOSX-arm64.sh
# Install Miniforge for 64-bit Linux
curl -L https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O
bash Miniforge3-Linux-x86_64.sh
rm Miniforge3-Linux-x86_64.sh
The installer will ask you questions during the installation:
- Do you accept the license terms? (Yes)
- Do you accept the installation path or do you want to choose a different one? (Probably yes)
- Do you want the installer to initialize Miniforge (Yes)
Restart your shell so that the settings in ~/.bashrc
or ~/.bash_profile
can take effect. You can verify that the installation worked by running:
conda --version
4.1 If you already have Conda installed
If you already have installed Conda you can make sure you’re using the latest version by running conda update -n base conda
and skip the installation instructions below.
4.2 Configuring Conda
As a last step we will set up the default channels (from where packages will be searched for and downloaded if no channel is specified):
conda config --add channels bioconda
conda config --add channels conda-forge
And we will also set so called ‘strict’ channel priority, which ensures higher stability and better performance (see details about this setting by running the following:
conda config --set channel_priority strict
The Conda docs specify a couple of things to keep in mind when using Conda. First of all, conda
should be installed in the base
environment and no other packages should be installed into base
. Furthermore, mixing of the conda-forge
and defaults
channels should be avoided as the default Anaconda channels are incompatible with conda-forge
. Since we are installing from miniforge
we get the conda-forge
defaults without having to do anything.
4.3 Conda on new Macs
If you have one of the newer Macs with Apple chips (the M-series) you may run into some problems with certain Conda packages that have not yet been built for the ARM64 architecture. The Rosetta software allows ARM64 Macs to use software built for the old AMD64 architecture, which means you can always fall back on creating AMD/Intel-based environments and use them in conjunction with Rosetta. This is how you do it:
CONDA_SUBDIR=osx-64 <conda-command>
conda activate <env>
conda config --env --set subdir osx-64
The first command creates the Intel-based environment, while the last one makes sure that subsequent commands are also using the Intel architecture. If you don’t want to remember and do this manually each time you want to use AMD64/Rosetta you can check out this bash script.
5 Installing Snakemake
We will use Conda environments for the set up of this tutorial, but don’t worry if you don’t understand exactly what everything does - you’ll learn all the details at the course. First make sure you’re currently situated inside the tutorials directory (workshop-reproducible-research/tutorials
) and then create the Conda environment like so:
conda env create -f snakemake/environment.yml -n snakemake-env
conda activate snakemake-env
Some of the packages in this environment is not available for the ARM64 architecture, so you’ll have to follow the instructions above.
Check that Snakemake is installed correctly, for example by executing snakemake --help
. This should output a list of available Snakemake settings. If you get bash: snakemake: command not found
then you need to go back and ensure that the Conda steps were successful. Once you’ve successfully completed the above steps you can deactivate the environment using conda deactivate
and continue with the setup for the other tools.
6 Installing Nextflow
The easiest way to install Nextflow is the official one, which is to just run the following code:
curl -s https://get.nextflow.io | bash
This will give you the nextflow
file in your current directory - move this file to a directory in your PATH
, e.g. /usr/bin/
.
If you’re getting Java-related errors, you can either try to update your Java installation (Nextflow requires Java 11 or later) or install Nextflow using conda. If you want to use Conda, navigate to workshop-reproducible-research/tutorials
and create the environment:
conda env create -f nextflow/environment.yml -n nextflow-env
conda activate nextflow-env
Some of the packages in this environment is not available for the ARM64 architecture, so you’ll have to follow the instructions above.
Check that Nextflow was installed correctly by running nextflow -version
. If you successfully installed Nextflow using Conda you can now deactivate the environment using conda deactivate
and continue with the other setups, as needed.
7 Installing Quarto
Installing Quarto is easiest by going to the official website and downloading the OS-appropriate package and following the installation instructions. You also need to install a LaTeX distribution to be able to render Quarto documents to PDF, which can be done using Quarto itself:
quarto install tinytex
While we’re not installing Quarto itself using Conda, we will install some software packages that are used in the Quarto tutorial using Conda: make sure your working directory is in the tutorials directory (workshop-reproducible-research/tutorials
) and install the necessary packages defined in the environment.yml
:
conda env create -f quarto/environment.yml -n quarto-env
8 Installing Jupyter
Let’s continue using Conda for installing software, since it’s so convenient to do so! Move into the tutorials directory (workshop-reproducible-research/tutorials
), create a Conda environment from the jupyter/environment.yml
file and test the installation of Jupyter, like so:
conda env create -f jupyter/environment.yml -n jupyter-env
conda activate jupyter-env
Once you’ve successfully completed the above steps you can deactivate the environment using conda deactivate
and continue with the setup for the other tools.
9 Installing Docker
Installing Docker (specifically Docker Desktop) is quite straightforward on Mac, Windows and Linux distributions. Note that Docker runs as root, which means that you have to have sudo
privileges on your computer in order to install or run Docker. When you have finished installing docker, regardless of which OS you are on, please type docker --version
to verify that the installation was successful.
The latest version of Docker may not work if you have an old version of either OSX or Windows. You can find older Docker versions that may be compatible for you if you go to https://docs.docker.com/desktop/ and click “Previous versions” in the left side menu.
9.1 MacOS
Go to docker.com and select the download option that is suitable for your computer’s architecture (i.e. if you have an Intel chip or a newer Apple silicon chip). This will download a dmg
file - click on it when it’s done to start the installation. This will open up a window where you can drag the Docker.app to Applications. Close the window and click the Docker app from the Applications menu. Now it’s basically just to click “next” a couple of times and we should be good to go. You can find the Docker icon in the menu bar in the upper right part of the screen.
9.2 Linux
Go to the linux-install section of the Docker documentation and make sure that your computer meets the system requirements. There you can also find instructions for different Linux distributions in the left sidebar under Installation per Linux distro.
9.3 Windows
In order to run Docker on Windows your computer must support Hardware Virtualisation Technology and virtualisation must be enabled. This is typically done in BIOS. Setting this is outside the scope of this tutorial, so we’ll simply go ahead as if though it’s enabled and hope that it works.
On Windows 10/11 we will install Docker for Windows, which is available at docker.com. Click the link Download from Docker Hub, and select Get Docker. Once the download is complete, execute the file and follow the instructions. You can now start Docker from the Start menu. You can search for it if you cannot find it; the Docker whale icon should appear in the task bar.
You will probably need to enable integration with the Linux subsystem, if you haven’t done so during the installation of Docker Desktop. Right-click on the Docker whale icon in the task bar and select Settings. Choose Resources and select WPS integration. Enable integration with the Linux subsystem and click Apply & Restart; also restart the Linux subsystem.