Precourse

This workshop is aimed towards biologists, researchers, computer scientists or data analysts with limited experience in analysing NGS data.

Preparation

These are steps to be completed before the workshop.

Uppmax ID

A remote computing cluster (UPPMAX) will be use for data analyses. A SUPR/SNIC account is needed to use UPPMAX resources.

If you do not already have one, create an account at https://supr.snic.se/.

Log in to SUPR/SNIC and request membership to the project ID g2019031.

Once you are accepted to the project, you should see that project listed under your active projects.

SSH & SFTP access

You need a program to connect to a remote cluster (UPPMAX). Linux and Mac users already have terminal on their systems.

If you are on a Windows system, we recommend MobaXterm. It is recommended that you INSTALL the program and not use the portable version. MobaXterm also has an integrated SFTP file browser.

Mac users will need to download and install XQuartz for X11 forwarding. ie; to forward remotely opened windows to local machine.

Type ssh -Y user@rackham.uppmax.uu.se in the terminal and then enter your password. The password will not be visible as you type.

If you need to transfer data between UPPMAX and your computer, you can use tools SCP or SFTP through the terminal. Windows users can use the SFTP browser available with MobaXterm.

For all users, if you prefer a GUI to upload and download files from UPPMAX, we recommend installing FileZilla.

Course directory access

For this step, you will have to use the terminal a bit. You can get started by following Tutorial One at this link Unix tutorial for beginners. You must use https://scilifelab.github.io/courses/ngsintro/common/emu/ (or this mirror) to try the commands in the tutorial, so that you don’t mess up any real world system. If you any questions regarding this tutorial contact: .

Make sure that you can read and write in the workshop folder. Go to /proj/g2019031/nobackup/ and create a directory with your username. For example mkdir jody. You will work inside this directory for the workshop. For example /proj/g2019031/nobackup/jody. If you cannot write to the folder, the most likely reason is that you have not requested access to the workshop project via SUPR. This is described in step 1.

Note that it may take an hour or so from request approval, before you can actually write to the folder. We will check before the workshop that all students have logged in and done this, so do not forget!

Tools

We will use IGV and R on Uppmax and therefore, you do not need to install it on your own system. Alternatively, if you would like to try it out on your own system, instructions are given below.

Optional

Download IGV (Integrated Genome Browser) from the Broad Institute on your own computer and have the mouse genome (mm10) available.

If you plan to run R steps locally on your computer, R and RStudio need to be installed on your computer .

Install R statistical software from r-project.org. If you have an old version of R and you do not use it, uninstall it and then install a newer version. Make sure you have one of the recent versions of R (Preferably one version older than the latest). For Windows users, it is recommended that you DO NOT install to C:\Program Files\R\. Instead, install to C:\R\.

Install RStudio. RStudio provides you with tools like code editor with highlighting, project management, version control, package building, debugger, profiler and more.

Extra R packages used in the workshop exercises (if any) are listed below. It is recommended that you install this in advance. Simply copy and paste the code into R.

For Linux and Mac users, you may have to install some extra OS specific libraries before installing the R packages. Pick the packages as needed based on your system below.

libcurl

  • deb: libcurl4-openssl-dev (Debian, Ubuntu, etc)
  • rpm: libcurl-devel (Fedora, CentOS, RHEL)
  • csw: libcurl_dev (Solaris)

libssl

  • deb: libssl-dev (Debian, Ubuntu, etc)
  • rpm: openssl-devel (Fedora, CentOS, RHEL)
  • csw: libssl_dev (Solaris)
  • brew: (Mac OSX)

libxml2

  • deb: libxml2-dev (Debian, Ubuntu, etc)
  • rpm: libxml2-devel (Fedora, CentOS, RHEL)
  • csw: libxml2_dev (Solaris)

libudunits2-dev
libopenblas-base

Install the R packages. Simply copy and paste the code into R.

# install from cran
install.packages(c('BiocManager','devtools','dplyr','ggplot2','ggpubr','pheatmap','stringr','tidyr'))
# install from bioconductor
BiocManager::install(c('biomaRt','DESeq2','edgeR','goseq','GO.db','methods','org.Mm.eg.db','reactome.db'))

Syllabus

The syllabus for this workshop are as follows.

  • Working on the unix/linux command line
    • Command line navigation and related commands: cd, mkdir, rm, rmdir
    • Commonly used linux tools: cp, mv, tar, less, more, head, tail, nano, grep, top, man
    • Wildcards
    • Ownership and permissions
    • Symbolic links
    • Piping commands
  • Working on remote computing cluster
    • Logging on to UPPMAX
    • Booking resources
    • Job templates, submission and queues
    • Modules
  • Commonly used bioinformatic tools and pipelines
  • Working with integrated genome viewer
  • Variant-calling workflow
    • Best practise workflow for germline variant calling
    • VCF file format
  • RNA-Seq workflow
    • RNA-Seq experimental design and considerations
    • QC, mapping and gene expression counts
    • Differential gene expression analyses
  • Current advances in NGS technologies

Learning outcomes

  • Awareness of the current state of NGS technologies
  • Familiarity with unix/linux command line interface to perform basic tasks
  • Connecting to and working on a remote computing cluster
  • Familiarity with general workflow for variant-calling and rna-seq
  • Follow basic NGS jargon in scientific papers