This page contains links to different lectures (slides) and practical exercises (labs) that are part of this workshop. The links below are similar to that under Schedule but organised here by topic.

Input code blocks are displayed like shown below. The code language is displayed above the block. Shell scripts (SH) are to be executed in the linux terminal. R scripts are to be run in R either through the terminal, RGui or RStudio.

sh
command

Note   Tip   Discuss   Task


Introduction

A brief introduction to the world of RNA and RNA-Seq technology.

A primer to working on the unix/linux command line and working on a remote computing cluster Uppmax.

RNA-Seq analyses is usually carried out in R programming language and it will be useful to learn some basic R.

This topic covers retrieving supporting data needed for RNA-seq analyses. These include gene annotation IDs such as mapping between Ensembl IDs and Gene IDs, GO terms and transcript IDs. We also cover retrieving genomic data from Ensembl.

Main lab

Data

In most of the exercises, we will use RNA-seq data (Illumina short reads) from the dataset GSE131032 Czarnewski et al (2019) Nat Comm.

To elucidated through an unbiased manner which genes and pathways are differentially regulated during mouse colonic inflammation followed by a tissue regeneration phase. In particular, we took advantage of the widely used dextran sodium sulfate (DSS)-induced model of colitis. This model is one of the few characterized by a phase of damage followed by a phase of regeneration. Therefore, this model gave the possibility to identify also sets of genes essential in the regeneration phase, a key step towards the resolution of the inflammation. In short, mice were exposed to DSS in the drinking water for 7 days, then allowed to recover for the following 7 days. During this period, we collected colonic tissue samples every second day to then be analyzed by RNA sequencing (RNA-seq). Next, we performed a RNA-seq analysis from colonic samples throughout the experiment and computed differentially expressed genes (DEGs) taking the complete kinetics of expression into consideration for p-value estimation using EdgeR.

For this course, downsampled FASTQ files from this dataset from 2 experimental groups (day00 and day07, 3 samples each) will be used in the labs on read mapping, transcript assembly, visualization, quality control and differential expression. There are many relevant questions that could be asked based on these measurements, from several quality checks to understanding biological insignts.

Quality control

Before doing any other analysis on mapped RNA-seq reads it is always important to do quality control of your mapped reads and that you do not have any obvious errors in your RNA-seq data.

Mapping

This section contains information on how to map reads to a reference genome using splice-aware aligner STAR and HISAT2.

Post-alignment QC

After alignment, the BAM files are inspected for various alignment metrics. Some of these include the number of reads mapped, number of unmapped reads, regions in the reference that reads map to, gene body coverage, signs of DNA contamination etc.

BAM files are optionally visualised using integrated genome viewer.

Quantification

Gene counts are quantified from BAM files using featureCounts.

Exploratory data analyses

Before commencing any quantitative analyses, it is important to run some exploratory analyses to access similarity between samples. This is a vital step to identify mislabelled samples, poor-quality samples and/or replicates that differ considerably. This section dives deeper into exploratory analyses PCA and hierarchical clustering.

Differential gene expression

We find genes that are differentially expressed between our time points.

Functional analysis

We will perform functional analysis on the differentially expressed genes to place them into a function context and possibly explain the biological consequences of DE. Methods covered are GSA (Gene set analysis) and GSEA (Gene set enrichment analysis).

Bonus labs

Pseudoaligners

This is an alternative step INSTEAD of mapping, PA-QC and quantification. Kallisto uses FastQ reads and a reference transcriptome (cDNA+ncRNA) to quantify transcripts using rapid pseudo-alignment along with bootstrap replicates to assess quantification inaccuracy. Kallisto is significantly faster than STAR or HISAT2 and has a small memory footprint. Kallisto generates transcript counts. Differential transcript expression is carried out using Sleuth which utilises bootstrap replicates.

small RNA analyses

RNA-seq differential analyses workflow on microRNAs from Fruit fly.

Assembly & annotation

Raw sequencing short reads are assembled into transcripts using two approaches. Genome-guided assembly using HiSat2 and StringTie. De-novo transcriptome assembly using Trinity. Assembled transcriptomes are functionally annotated to identify genes.