Roy Francis / Dag Ahrén
31-May-2024
Conesa et al. (2016)
Corley et al. (2017)
Per base sequence quality
Per sequence quality scores
When to avoid trimming?
Baruzzo et al. (2017)
Program | Time_Min | Memory_GB |
---|---|---|
HISATx1 | 22.7 | 4.3 |
HISATx2 | 47.7 | 4.3 |
HISAT | 26.7 | 4.3 |
STAR | 25 | 28 |
STARx2 | 50.5 | 28 |
GSNAP | 291.9 | 20.2 |
TopHat2 | 1170 | 4.3 |
@ST-E00274:179:HHYMLALXX:8:1101:1641:1309 1:N:0:NGATGT
NCATCGTGGTATTTGCACATCTTTTCTTATCAAATAAAAAGTTTAACCTACTCAGTTATGCGCATACGTTTTTTGATGGCATTTCCATAAACCGATTTTTTTTTTATGCACGTACCCAAAACGTGCAGAAAAATACGCTGCTAGAAATGTA
+
#AAAFAFA<-AFFJJJAFA-FFJJJJFFFAJJJJ-<FFJJJ-A-F-7--FA7F7-----FFFJFA<FFFFJ<AJ--FF-A<A-<JJ-7-7-<FF-FFFJAFFAA--A--7FJ-7----77-A--7F7)---7F-A----7)7-----7<<-
@instrument:runid:flowcellid:lane:tile:xpos:ypos read:isfiltered:controlnumber:sampleid
>1 dna:chromosome chromosome:GRCz10:1:1:58871917:1 REF
GATCTTAAACATTTATTCCCCCTGCAAACATTTTCAATCATTACATTGTCATTTCCCCTC
#!genome-build GRCz10
4 ensembl_havana gene 6732 52059 . - . gene_id "ENSDARG00000104632"; gene_version "2"; gene_name "rerg"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; havana_gene "OTTDARG00000044080"; havana_gene_version "1";
seq source feature start end score strand frame attribute
ST-E00274:188:H3JWNCCXY:4:1102:32431:49900 163 1 1 60 8S139M4S = 385 535 TATTTAGAGATCTTAAACATCCATTCCCCCTGCAAACATTTTCAATCATTACATTGTCATTTTCCCTCCAAATTAAATTTAGCCAGAGGCGCACAACATACGACCTCTAAAAAAGGTGCTGGAACATGTACCTATATGCAGCACCACCATC AAAFAFFAFFFFJ7FFFFJ<JAFA7F-<AJ7JJ<FFFJ--<FAJF<7<7FAFJ-<AFA<-JJJ-AF-AJ-FF<F--A<FF<-7777-7JA-77A---F-7AAFF-FJA--77FJ<--77)))7<JJA<J77<-------<7--))7)))7- NM:i:4 MD:Z:12T0T40C58T25 AS:i:119 XS:i:102 XA:Z:17,-53287490,4S33M4D114M,11; MQ:i:60 MC:Z:151M RG:Z:ST-E00274_188_H3JWNCCXY_4
query flag ref pos mapq cigar mrnm mpos tlen seq qual opt
Never store alignment files in raw SAM format. Always compress it! SAM format
Format | Size_GB |
---|---|
SAM | 7.4 |
BAM | 1.9 |
CRAM lossless Q | 1.4 |
CRAM 8 bins Q | 0.8 |
CRAM no Q | 0.26 |
tview
samtools tview alignment.bam genome.fasta
STAR (final log file), samtools stats, bamtools stats, QoRTs, RSeQC, Qualimap
MultiQC can be used to summarise and plot STAR log files.
QoRTs was run on all samples and summarised using MultiQC.
Read mapping profile
Gene body coverage
Sigurgeirsson et al. (2014)
Insert size
Saturation curve
Francis et al. (2013)
PCR duplicates
Multi-mapping
Kallisto, Salmon
ENSG00000000003 140 242 188 143 287 344 438 280 253
ENSG00000000005 0 0 0 0 0 0 0 0 0
ENSG00000000419 69 98 77 55 52 94 116 79 69
ENSG00000000457 56 75 104 79 157 205 183 178 153
ENSG00000000460 33 27 23 19 27 42 69 44 40
Pairwise correlation between samples must be high (>0.9)
Dillies et al. (2013), Evans et al. (2018), Wagner et al. (2012)
~age+condition
estimateSizeFactors()
estimateDispersions()
nbinomWaldTest()
results()
log2 fold change (MLE): type type2 vs control
Wald test p-value: type type2 vs control
DataFrame with 1 row and 6 columns
baseMean log2FoldChange lfcSE
<numeric> <numeric> <numeric>
ENSG00000000003 242.307796723287 -0.932926089608546 0.114285150312285
stat pvalue padj
<numeric> <numeric> <numeric>
ENSG00000000003 -8.16314356729037 3.26416150242775e-16 1.36240609998527e-14
summary()
out of 17889 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 4526, 25%
LFC < 0 (down) : 5062, 28%
outliers [1] : 25, 0.14%
low counts [2] : 0, 0%
(mean count < 3)
plotMA()
plotCounts()
Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., … & Mortazavi, A. (2016). A survey of best practices for RNA-seq data analysis. Genome biology, 17(1), 1-19.
Main exercise
Bonus exercises
Data: /sw/courses/ngsintro/rnaseq/
Work: /proj/naiss2024-22-212/nobackup/user/rnaseq/
/sw/courses/ngsintro/rnaseq/
rnaseq/
+-- bonus/
| +-- assembly/
| +-- exon/
| +-- funannot/
| +-- plots/
+-- documents/
+-- main/
+-- 1_raw/
+-- 2_fastqc/
+-- 3_mapping/
+-- 4_qualimap/
+-- 5_dge/
+-- 6_multiqc/
+-- reference/
| +-- mouse_chr19_hisat2/
+-- scripts/
/proj/naiss2024-22-212/nobackup/user/rnaseq/
[user]/
rnaseq/
+-- 1_raw/
+-- 2_fastqc/
+-- 3_mapping/
+-- 4_qualimap/
+-- 5_dge/
+-- 6_multiqc/
+-- reference/
| +-- mouse_chr19_hisat2/
+-- scripts/
+-- funannot/
+-- plots/