In this tutorial we will go through some of the key steps in performing a quality control on your samples. We will start with the read based quality control, using FastQC. This will analyse your RNA fragments and see if there are any biases
All the data you need for this lab is available in the subfolder fastq
of the data folder that you have downloaded to this course .
FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to get a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.
The main functions of FastQC are:
You can read more about the program and have a look at example reports at the FastQC website.
This program can be used for any type of NGS data, not only RNA-seq.
# To see help information on the FastQC package:
fastqc --help
# Run for one FASTQ file:
fastqc -o outdir fastqfile
# Run on multiple FASTQ files:
fastqc -o outdir fastqfile1 fastqfile2 etc.
# You can use wildcards to run on all FASTQ files in a directory:
fastqc -o outdir /the/folder/where/you/have/your/data/*.fastq(.gz)
Take a look at some other file and see if it looks similar in quality.
MultiQC is a program that creates summaries over all samples for several different types of QC-measures. You can read more about the program here. It will automatically look for output files from the supported tools and make a summary of them. You can either go to the folder where you have the Fastqc output or run it with the path to the folder.
multiqc /folder/with/FastQC_results/
This should create a folder named multiqc_data
with some general stats, links to all files etc. And one file named multiqc_report.html
.
Have a look at the report you created.
###############################################################################
### PRE-MAPPING QC ANALYSIS #####
###############################################################################
################
### FASTQC ###
################
cd ~/RNAseq
for i in $(ls rawData/fastqFiles/*.fq.gz); do
fastqc \
-o results/01-preAlignmentQC/ \
$i &
done
wait
################
### MultiQC ###
################
cd ~/RNAseq/results/01-preAlignmentQC/
multiqc .