Main lab
Data
In most of the exercises, we will use RNA-seq data (Illumina short reads) from the dataset GSE131032 Czarnewski et al (2019) Nat Comm.
To elucidated through an unbiased manner which genes and pathways are differentially regulated during mouse colonic inflammation followed by a tissue regeneration phase. In particular, we took advantage of the widely used dextran sodium sulfate (DSS)-induced model of colitis. This model is one of the few characterized by a phase of damage followed by a phase of regeneration. Therefore, this model gave the possibility to identify also sets of genes essential in the regeneration phase, a key step towards the resolution of the inflammation. In short, mice were exposed to DSS in the drinking water for 7 days, then allowed to recover for the following 7 days. During this period, we collected colonic tissue samples every second day to then be analyzed by RNA sequencing (RNA-seq). Next, we performed a RNA-seq analysis from colonic samples throughout the experiment and computed differentially expressed genes (DEGs) taking the complete kinetics of expression into consideration for p-value estimation using EdgeR.
For this course, downsampled FASTQ files from this dataset from 2 experimental groups (day00 and day07, 3 samples each) will be used in the labs on read mapping, transcript assembly, visualization, quality control and differential expression. There are many relevant questions that could be asked based on these measurements, from several quality checks to understanding biological insignts.
Quality control
Before doing any other analysis on mapped RNA-seq reads it is always important to do quality control of your mapped reads and that you do not have any obvious errors in your RNA-seq data.
Mapping
This section contains information on how to map reads to a reference genome using splice-aware aligner STAR and HISAT2.
Post-alignment QC
After alignment, the BAM files are inspected for various alignment metrics. Some of these include the number of reads mapped, number of unmapped reads, regions in the reference that reads map to, gene body coverage, signs of DNA contamination etc.
BAM files are optionally visualised using integrated genome viewer.
Quantification
Gene counts are quantified from BAM files using featureCounts.
Filtering & Normalisation
Exploratory data analyses
Before commencing any quantitative analyses, it is important to run some exploratory analyses to access similarity between samples. This is a vital step to identify mislabelled samples, poor-quality samples and/or replicates that differ considerably. This section dives deeper into exploratory analyses PCA and hierarchical clustering.
Differential gene expression
We find genes that are differentially expressed between our time points.
Functional analysis
We will perform functional analysis on the differentially expressed genes to place them into a function context and possibly explain the biological consequences of DE. Methods covered are GSA (Gene set analysis) and GSEA (Gene set enrichment analysis).