This page contains links to different tutorials that are used in this course. The tutorials are well documented and should be easy to follow.

Input code blocks are displayed like shown below. The code language is displayed above the block. Shell scripts (SH) are to be executed in the linux terminal such as bash. R scripts are to be run in R either through the terminal, RGui or RStudio.

command

Note Tip Discuss Task

1 Introduction

Most of the analyses is carried out in R and it will be useful to learn some basic R.

Introduction to R

This topic covers retrieving supporting data needed for RNA-seq analyses. These include gene annotation IDs such as mapping between Ensembl IDs and Gene IDs, GO terms and transcript IDs. We also cover retrieving genomic data from Ensembl.

Downloading data

2 Main lab

2.1 Data

In most of the exercises, we will use RNA-seq data (Illumina short reads) from the human A431 cell line. It is an epidermoid carcinoma cell line which is often used to study cancer and the cell cycle, and as a sort of positive control of epidermal growth factor receptor (EGFR) expression. A431 cells express very high levels of EGFR, in contrast to normal human fibroblasts.

The A431 cells were treated with gefinitib, which is an EGFR inhibitor and is used (under the trade name Iressa) as a drug to treat cancers with mutated and overactive EGFR. In the experiment, RNA was extracted at four time points: before the gefinitib treatment (t=0), and two, six and twenty-four hours after treatment (t=2, t=6, t=24, respectively), and sequenced using an Illumina HiSeq instrument in triplicates (thus there are 3x4=12 samples).

This data set or parts of it will be used in the labs on read mapping, transcript assembly, visualization, quality control and differential expression. There are many relevant questions that could be asked based on these measurements. In the QC exercise, we are going to examine if the RNA libraries that we work with are what we think they are or if there is any mislabelling. In the isoform exercise, we are going to look at some specific regions where the mass-spectrometry data indicated that novel exons or splice variants could be present at the protein level. We will use (part of) the RNA-seq data to examine if there is corresponding evidence on the mRNA level, and how different software tools could be used to detect novel gene variants.

2.2 Quality control

Before doing any other analysis on mapped RNA-seq reads it is always important to do quality control of your mapped reads and that you do not have any obvious errors in your RNA-seq data.

Quality control

2.3 Mapping

This section contains information on how to map reads to a reference genome using splice-aware aligner STAR and HISAT2.

Mapping reads using STAR

2.4 IGV

Mapped reads in BAM files are visualised using integrated genome viewer.

Using IGV

2.5 Quantification

Gene counts are quantified from BAM files using featureCounts.

Quantification

2.6 Differential gene expression

We find genes that are differentially expressed between our time points.

DGE using DEseq2

2.7 Functional analysis

We will perform functional analysis on the differentially expressed genes to place them into a function context and possibly explain the biological consequences of DE. Methods covered are GSA (Gene set analysis) and GSEA (Gene set enrichment analysis).

Functional analysis

3 Bonus labs

3.1 Exploratory data analyses

This section dives deeper into exploratory analyses PCA and hierarchical clustering.

PCA & Hierarchical clustering

3.2 Pseudoaligners

Kallisto uses FastQ reads and a reference transcriptome (cDNA+ncRNA) to quantify transcripts using rapid pseudo-alignment along with bootstrap replicates to assess quantification inaccuracy. Kallisto is significantly faster than STAR or HISAT2 and has a small memory footprint. Differential gene expression is carried out using Sleuth which utilises bootstrap replicates.

Mapping and quantification using Kallisto, DGE using Sleuth

3.3 small RNA analyses

RNA-seq differential analyses workflow on microRNAs from Fruit fly.

Small RNA-seq analyses

3.4 Assembly & annotation

Raw sequencing short reads are assembled into transcripts using two approaches. Genome-guided assembly using HiSat2 and StringTie. De-novo transcriptome assembly using Trinity. Assembled transcriptomes are functionally annotated to identify genes.

Reference-guided assembly using StringTie
De-novo assembly using Trinity
Transcriptome annotation

End of document

Lab