SeqMonk Vistory Report

Validation of RNA-Seq knockout

In this example we're going to look at a publicly avaliable dataset in which two genes (Yap1 and Taz) have been knocked out. The data comes from GEO study GSE79115.

The data was processed through the standard clusterflow fastq_hisat2 pipeline which did trimming with TrimGalore and mapping with hisat2. The quality of the data all looked fine and the species screen showed that it was indeed mouse, as expected.

I've imported the data into SeqMonk and cleaned up the sample names a bit, I've also created replicate sets for the two biological conditions (WT and cKO), so it's all ready to go.

Parameter	Value
Project Name	poitelon_rnaseq.smk
Vistory Name	validating_knockout.smv
Genome	Mus musculus GRCm38_v95
SeqMonk Version	1.45.2.devel

Name	File Name	Reads	Import Options
cKO-1	/bi/pubcache/TIDIED/Poitelon_2016/SRR3222409_GSM2085998_YAP_TAZ-DBL-cKO-1_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam	44595229	Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000
cKO-2	/bi/pubcache/TIDIED/Poitelon_2016/SRR3222410_GSM2085999_YAP_TAZ-DBL-cKO-2_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam	35385632	Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000
cKO-3	/bi/pubcache/TIDIED/Poitelon_2016/SRR3222411_GSM2086000_YAP_TAZ-DBL-cKO-3_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam	43511442	Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000
WT-1	/bi/pubcache/TIDIED/Poitelon_2016/SRR3222412_GSM2086001_YAP_TAZ-WT-1_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam	66444258	Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000
WT-2	/bi/pubcache/TIDIED/Poitelon_2016/SRR3222413_GSM2086002_YAP_TAZ-WT-2_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam	56949469	Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000
WT-3	/bi/pubcache/TIDIED/Poitelon_2016/SRR3222414_GSM2086003_YAP_TAZ-WT-3_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam	58610561	Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000

cKO	WT
cKO-1	WT-1
cKO-2	WT-2
cKO-3	WT-3

The data looks pretty good on a basic inspection. The reads are nicely restricted to exons and cover the whole length of the transcripts. Oddly the library is non-directional. The paper says that the library prep used the Illumina Tru-Seq kit, which should give a directional library, so something a bit odd is going on.

There is no obvious DNA contamination, but there is maybe a suggestion of a small amount of PCR duplication since isolated reads often appear as duplets.

This all looks good. The library is very clean with almost all reads being in exons, but it was polyA selected so this is expected. We see no rRNA, and only a small amount of MT. There is a bit of variation in the library sizes, and this does seem to link to the condition, with all of the WT samples having more, but it's still less than 2fold different overall so I'm not too concerned. The directionality is still lacking which matches what we saw in the raw data.

New Probe Set (411632 probes)

Feature generator using mRNA split into exons duplicates removed Over feature from 0-0

Whilst there is duplication in the samples it is in line with the local data density, so it seems that this is biological in nature rather than technical. There is nothing to worry about overall in terms of the duplication rates in the samples.

We'll do a standard gene level quantitation. We'll use the default log2FPM (correcting for library size but not for transcript length) since we're only interested in comparing the same transcript between samples - we don't care about comapring different transcripts in the same sample.

New Probe Set (35266 probes)

Transcript features over mRNA

Probes Quantitated

RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library

The default normalisation looks pretty good, but it might improve slightly if we add in size factor normalisation, so let's try that. We'll remove all of the obviously unexpressed genes first.

New Probe List: Expressed somewhere (19140 probes)

Filter on probes in All Probes where at least 1 of cKO-1 , cKO-2 , cKO-3 , WT-1 , WT-2 , WT-3 had a value above 50.0. Quantitation was RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library

Probes Quantitated

RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library transformed by size factor normalisation using Add using the Expressed somewhere (19140) probe list

It's slightly nicer with the additional normalisation so we'll keep that.

We can start by using PCA to see how the samples cluster.

The samples form two obvious clusters and these split by the knockout status on PC1 so this all looks good. The cKO-1 sample seems a little divergent on PC2 so that might need looking at.

There are some very obvious differences between the samples, but the structure of the differences looks odd. There is a pretty obvious group of genes which appear to be very consistently upregulated in the knockout sample, all changing by a log2 diff of around 2.5 over a wide range of expression levels. This doesn't look normal and suggests that there may be an additional effect on these samples beyond the knockout, possibly a copy number change? This asymmetry in the changes probably explains why the size factor normalisation produced some benefit.

There are some genes which show quite large expression decreases in the cKO samples, but nothing looks like it really moves from being highly expressed to being absent, so we should check on the knockouts.

Since the cKO samples are supposed to be a double knockout of Yap1 and Taz we should look at those genes specifically.

New Probe List: Yap1 (1 probes)

Probes from Expressed somewhere whose name matches any of Yap1 after stripping probe generator suffixes

New Probe List: Taz (1 probes)

Probes from Expressed somewhere whose name matches any of Taz after stripping probe generator suffixes

That doesn't look good. Neither of the genes which are supposed to be knocked out actually appear to show any real change in the cKO group. Depending on how the knockout was done this might not be an issue. It it's very tightly targetted then it could be that the transcripts are still being produced but in a non-functional form.

I went and looked up the paper for the original article where the Yap1/Taz knockout was created ((PMC3605093) and in both cases they used a Cre/Lox system to completely excise the first complete coding exon (exon 2). It's possible that after deletion the transcript splices around the missing exon and goes on, but we should check.

The second exon of Yap1 certainly appears to still be there, and at a similar level to the first exon. The total number of reads is fewer, but that's in line with the lower coverage overall in the cKO samples (which we saw in the QC report).

Likewise the second Taz exon also appears to be present at levels which are consistent with the exons around it. Given these observations I would be very cautious of proceeding with the analysis without going back to validate the exact knockouts in more detail, because on the face of it it looks like the expected effects are absent from these samples.

Differences examination

Despite the lack of expected knockouts in the samples we did definitely see some consistent expression differences between the two groups, but we also saw the oddly consistently regulated set of genes. To try to understand what's going on with these we can see if we can find any obvious connection between the genes in this set.

To isolate these genes we can do a crude extraction of them based on their fold change. I can just manually select these in a scatterplot.

New Probe List: Oddly similar changes (511 probes)

Filter on probes in Expressed somewhere where average difference when comparing WT to cKO had a value between 1.8 and 3.0. Quantitation was RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library transformed by size factor normalisation using Add using the Expressed somewhere (19140) probe list

The most obvious thing to connect these points would be phyiscal position in the genome, so let's see where they are:

Meh - they're all over the place so bang goes that theory. Let's have a look at what the actual genes are instead. We'll just take the ones which are more highly expressed to limit how many we have to deal with.

New Probe List: Highly expressed (108 probes)

Filter on probes in Oddly similar changes where exactly 1 of cKO had a value above 3.0. Quantitation was RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library transformed by size factor normalisation using Add using the Expressed somewhere (19140) probe list

Probe	Chromosome	Start	End	Probe Strand	Feature	ID	Description	Feature Strand	Type	Feature Orientation	cKO	WT
Acta1	8	123891769	123894751	-	Acta1	ENSMUSG00000031972	actin, alpha 1, skeletal muscle [Source:MGI Symbol;Acc:MGI:87902]	-	gene	Name match	10.6300745	8.097512
Mylpf	7	127208890	127214298	+	Mylpf	ENSMUSG00000030672	myosin light chain, phosphorylatable, fast skeletal muscle [Source:MGI Symbol;Acc:MGI:97273]	+	gene	Name match	10.00174	7.4619308
Myl1	1	66924295	66945404	-	Myl1	ENSMUSG00000061816	myosin, light polypeptide 1 [Source:MGI Symbol;Acc:MGI:97269]	-	gene	Name match	9.555613	7.085666
Myh8	11	67277124	67308634	+	Myh8	ENSMUSG00000055775	myosin, heavy polypeptide 8, skeletal muscle, perinatal [Source:MGI Symbol;Acc:MGI:1339712]	+	gene	Name match	9.314048	6.725017
Tnnt3	7	142498836	142516009	+	Tnnt3	ENSMUSG00000061723	troponin T3, skeletal, fast [Source:MGI Symbol;Acc:MGI:109550]	+	gene	Name match	9.128785	6.6186385
Ube2c	2	164769898	164778822	+	Ube2c	ENSMUSG00000001403	ubiquitin-conjugating enzyme E2C [Source:MGI Symbol;Acc:MGI:1915862]	+	gene	Name match	9.095146	7.261156
Tnnc2	2	164777161	164779967	-	Tnnc2	ENSMUSG00000017300	troponin C2, fast [Source:MGI Symbol;Acc:MGI:98780]	-	gene	Name match	8.9517355	6.5184937
Ckm	7	19404776	19422841	+	Ckm	ENSMUSG00000030399	creatine kinase, muscle [Source:MGI Symbol;Acc:MGI:88413]	+	gene	Name match	8.946425	6.266685
Tpm2	4	43514711	43523765	-	Tpm2	ENSMUSG00000028464	tropomyosin 2, beta [Source:MGI Symbol;Acc:MGI:98810]	-	gene	Name match	8.791783	6.863115
Tnni2	7	142441808	142444410	+	Tnni2	ENSMUSG00000031097	troponin I, skeletal, fast 2 [Source:MGI Symbol;Acc:MGI:105070]	+	gene	Name match	8.441571	5.8975697
Atp2a1	7	126445858	126463108	-	Atp2a1	ENSMUSG00000030730	ATPase, Ca++ transporting, cardiac muscle, fast twitch 1 [Source:MGI Symbol;Acc:MGI:105058]	-	gene	Name match	7.9781203	5.1985235
A930016O22Rik	7	19417466	19421417	-	A930016O22Rik	ENSMUSG00000040705	RIKEN cDNA A930016O22 gene [Source:MGI Symbol;Acc:MGI:3605623]	-	gene	Name match	7.7730346	5.131559
Eno3	11	70657202	70662513	+	Eno3	ENSMUSG00000060600	enolase 3, beta muscle [Source:MGI Symbol;Acc:MGI:95395]	+	gene	Name match	7.504761	5.2885814
Des	1	75360329	75368579	+	Des	ENSMUSG00000026208	desmin [Source:MGI Symbol;Acc:MGI:94885]	+	gene	Name match	7.4028907	5.2280498
Col2a1	15	97975602	98004695	-	Col2a1	ENSMUSG00000022483	collagen, type II, alpha 1 [Source:MGI Symbol;Acc:MGI:88452]	-	gene	Name match	7.156376	5.3229785
Ttn	2	76703980	76982547	-	Ttn	ENSMUSG00000051747	titin [Source:MGI Symbol;Acc:MGI:98864]	-	gene	Name match	7.0931764	4.4328256
Actc1	2	114047282	114053548	-	Actc1	ENSMUSG00000068614	actin, alpha, cardiac muscle 1 [Source:MGI Symbol;Acc:MGI:87905]	-	gene	Name match	6.89143	3.9702747
Casq1	1	172209894	172219868	-	Casq1	ENSMUSG00000007122	calsequestrin 1 [Source:MGI Symbol;Acc:MGI:1309468]	-	gene	Name match	6.8523135	4.718498
Pvalb	15	78191114	78206400	-	Pvalb	ENSMUSG00000005716	parvalbumin [Source:MGI Symbol;Acc:MGI:97821]	-	gene	Name match	6.7798343	4.195829
Tceal7	X	136214779	136226100	+	Tceal7	ENSMUSG00000079428	transcription elongation factor A (SII)-like 7 [Source:MGI Symbol;Acc:MGI:1915746]	+	gene	Name match	6.7450833	4.525802
Actn3	19	4861223	4877909	-	Actn3	ENSMUSG00000006457	actinin alpha 3 [Source:MGI Symbol;Acc:MGI:99678]	-	gene	Name match	6.699188	4.280667
Pygm	19	6384399	6398459	+	Pygm	ENSMUSG00000032648	muscle glycogen phosphorylase [Source:MGI Symbol;Acc:MGI:97830]	+	gene	Name match	6.44903	4.334618
Mybpc1	10	88518279	88605152	-	Mybpc1	ENSMUSG00000020061	myosin binding protein C, slow-type [Source:MGI Symbol;Acc:MGI:1336213]	-	gene	Name match	6.3010755	3.8659647
Pgam2	11	5801640	5803733	-	Pgam2	ENSMUSG00000020475	phosphoglycerate mutase 2 [Source:MGI Symbol;Acc:MGI:1933118]	-	gene	Name match	6.2571893	3.6856563
Neb	2	52136647	52378474	-	Neb	ENSMUSG00000026950	nebulin [Source:MGI Symbol;Acc:MGI:97292]	-	gene	Name match	6.1607213	4.0120506
Myoz1	14	20649107	20656540	-	Myoz1	ENSMUSG00000068697	myozenin 1 [Source:MGI Symbol;Acc:MGI:1929471]	-	gene	Name match	6.1156273	3.6680744
Fibin	2	110360917	110363183	-	Fibin	ENSMUSG00000074971	fin bud initiation factor homolog (zebrafish) [Source:MGI Symbol;Acc:MGI:1914856]	-	gene	Name match	6.0845985	3.5646398
Ldb3	14	34526603	34588682	-	Ldb3	ENSMUSG00000021798	LIM domain binding 3 [Source:MGI Symbol;Acc:MGI:1344412]	-	gene	Name match	6.032437	3.6439762
Actn2	13	12269426	12340760	-	Actn2	ENSMUSG00000052374	actinin alpha 2 [Source:MGI Symbol;Acc:MGI:109192]	-	gene	Name match	6.0224013	3.2486563
Srl	16	4480216	4541816	-	Srl	ENSMUSG00000022519	sarcalumenin [Source:MGI Symbol;Acc:MGI:2146620]	-	gene	Name match	6.016025	4.061566
Sln	9	53850164	53854560	+	Sln	ENSMUSG00000042045	sarcolipin [Source:MGI Symbol;Acc:MGI:1913652]	+	gene	Name match	6.0015106	3.0297756
Sypl2	3	108211472	108226648	-	Sypl2	ENSMUSG00000027887	synaptophysin-like 2 [Source:MGI Symbol;Acc:MGI:1328311]	-	gene	Name match	5.7019043	3.2455757
Myl4	11	104550663	104595753	+	Myl4	ENSMUSG00000061086	myosin, light polypeptide 4 [Source:MGI Symbol;Acc:MGI:97267]	+	gene	Name match	5.6841083	3.326821
Trdn	10	33080554	33476709	+	Trdn	ENSMUSG00000019787	triadin [Source:MGI Symbol;Acc:MGI:1924007]	+	gene	Name match	5.629187	3.2385871
Mybpc2	7	44501699	44524656	-	Mybpc2	ENSMUSG00000038670	myosin binding protein C, fast-type [Source:MGI Symbol;Acc:MGI:1336170]	-	gene	Name match	5.580316	2.735296
Itgb1bp2	X	101449088	101453541	+	Itgb1bp2	ENSMUSG00000031312	integrin beta 1 binding protein 2 [Source:MGI Symbol;Acc:MGI:1353420]	+	gene	Name match	5.5433006	2.9828875
Atp1b4	X	38316184	38336769	+	Atp1b4	ENSMUSG00000016327	ATPase, (Na+)/K+ transporting, beta 4 polypeptide [Source:MGI Symbol;Acc:MGI:1915071]	+	gene	Name match	5.3833213	3.1316712
Rtn2	7	19282624	19296160	+	Rtn2	ENSMUSG00000030401	reticulon 2 (Z-band associated protein) [Source:MGI Symbol;Acc:MGI:107612]	+	gene	Name match	5.3780785	3.4236689
Myom1	17	71002633	71126856	+	Myom1	ENSMUSG00000024049	myomesin 1 [Source:MGI Symbol;Acc:MGI:1341430]	+	gene	Name match	5.3354306	3.0266078
Cox6a2	7	128205435	128206387	-	Cox6a2	ENSMUSG00000030785	cytochrome c oxidase subunit 6A2 [Source:MGI Symbol;Acc:MGI:104649]	-	gene	Name match	5.310141	2.7472584
Apobec2	17	48419231	48432930	-	Apobec2	ENSMUSG00000040694	apolipoprotein B mRNA editing enzyme, catalytic polypeptide 2 [Source:MGI Symbol;Acc:MGI:1343178]	-	gene	Name match	5.287872	2.7192059
Pdlim3	8	45885461	45919548	+	Pdlim3	ENSMUSG00000031636	PDZ and LIM domain 3 [Source:MGI Symbol;Acc:MGI:1859274]	+	gene	Name match	5.273842	3.42713
Smpx	X	157698910	157752591	+	Smpx	ENSMUSG00000041476	small muscle protein, X-linked [Source:MGI Symbol;Acc:MGI:1913356]	+	gene	Name match	5.2584686	2.851238
Myom2	8	15057653	15133541	+	Myom2	ENSMUSG00000031461	myomesin 2 [Source:MGI Symbol;Acc:MGI:1328358]	+	gene	Name match	4.9835496	2.4547803
Eef1a2	2	181147653	181157014	-	Eef1a2	ENSMUSG00000016349	eukaryotic translation elongation factor 1 alpha 2 [Source:MGI Symbol;Acc:MGI:1096317]	-	gene	Name match	4.9614844	2.9759362
Iigp1	18	60376029	60392627	+	Iigp1	ENSMUSG00000054072	interferon inducible GTPase 1 [Source:MGI Symbol;Acc:MGI:1926259]	+	gene	Name match	4.952799	2.288293
Klhl41	2	69670120	69684230	+	Klhl41	ENSMUSG00000075307	kelch-like 41 [Source:MGI Symbol;Acc:MGI:2683854]	+	gene	Name match	4.9376125	2.5512183
Myot	18	44334074	44355724	+	Myot	ENSMUSG00000024471	myotilin [Source:MGI Symbol;Acc:MGI:1889800]	+	gene	Name match	4.866352	2.3614779
Hrc	7	45335290	45338974	+	Hrc	ENSMUSG00000038239	histidine rich calcium binding protein [Source:MGI Symbol;Acc:MGI:96226]	+	gene	Name match	4.8531127	2.388653
Ank1	8	22974844	23150497	+	Ank1	ENSMUSG00000031543	ankyrin 1, erythroid [Source:MGI Symbol;Acc:MGI:88024]	+	gene	Name match	4.8427024	2.5067291
Tcap	11	98383811	98384953	+	Tcap	ENSMUSG00000007877	titin-cap [Source:MGI Symbol;Acc:MGI:1330233]	+	gene	Name match	4.8094006	2.4194448
Ryr1	7	29003344	29125179	-	Ryr1	ENSMUSG00000030592	ryanodine receptor 1, skeletal muscle [Source:MGI Symbol;Acc:MGI:99659]	-	gene	Name match	4.8011594	2.4239838
Hfe2	3	96525172	96529210	+	Hfe2	ENSMUSG00000038403	hemochromatosis type 2 (juvenile) [Source:MGI Symbol;Acc:MGI:1916835]	+	gene	Name match	4.730496	2.3208005
Ampd1	3	103074014	103099720	+	Ampd1	ENSMUSG00000070385	adenosine monophosphate deaminase 1 [Source:MGI Symbol;Acc:MGI:88015]	+	gene	Name match	4.663221	2.0056393
Myoz2	3	123006206	123035015	-	Myoz2	ENSMUSG00000028116	myozenin 2 [Source:MGI Symbol;Acc:MGI:1913063]	-	gene	Name match	4.661734	2.3122559
Cacng1	11	107703218	107716522	-	Cacng1	ENSMUSG00000020722	calcium channel, voltage-dependent, gamma subunit 1 [Source:MGI Symbol;Acc:MGI:1206582]	-	gene	Name match	4.654035	2.1101506
Sh3bgr	16	96200450	96228935	+	Sh3bgr	ENSMUSG00000040666	SH3-binding domain glutamic acid-rich protein [Source:MGI Symbol;Acc:MGI:1354740]	+	gene	Name match	4.595351	1.9935116
Cox8b	7	140898945	140900446	-	Cox8b	ENSMUSG00000025488	cytochrome c oxidase subunit 8B [Source:MGI Symbol;Acc:MGI:105958]	-	gene	Name match	4.5683045	2.1308832
Fabp3	4	130308595	130315463	+	Fabp3	ENSMUSG00000028773	fatty acid binding protein 3, muscle and heart [Source:MGI Symbol;Acc:MGI:95476]	+	gene	Name match	4.5498366	2.3100097
Arhgap36	X	49463945	49500244	+	Arhgap36	ENSMUSG00000036198	Rho GTPase activating protein 36 [Source:MGI Symbol;Acc:MGI:1922654]	+	gene	Name match	4.499045	2.1075962
Cilp	9	65265180	65280605	+	Cilp	ENSMUSG00000042254	cartilage intermediate layer protein, nucleotide pyrophosphohydrolase [Source:MGI Symbol;Acc:MGI:2444507]	+	gene	Name match	4.4644227	2.4463303
Myh3	11	67078300	67102291	+	Myh3	ENSMUSG00000020908	myosin, heavy polypeptide 3, skeletal muscle, embryonic [Source:MGI Symbol;Acc:MGI:1339709]	+	gene	Name match	4.4403567	1.8177018
Nrap	19	56320035	56390037	-	Nrap	ENSMUSG00000049134	nebulin-related anchoring protein [Source:MGI Symbol;Acc:MGI:1098765]	-	gene	Name match	4.3482184	2.001753
Synpo2l	14	20658946	20668354	-	Synpo2l	ENSMUSG00000039376	synaptopodin 2-like [Source:MGI Symbol;Acc:MGI:1916010]	-	gene	Name match	4.3379483	2.1240387
Cacna1s	1	136052750	136119822	+	Cacna1s	ENSMUSG00000026407	calcium channel, voltage-dependent, L type, alpha 1S subunit [Source:MGI Symbol;Acc:MGI:88294]	+	gene	Name match	4.2639847	2.1601298
Cav3	6	112459505	112472872	+	Cav3	ENSMUSG00000062694	caveolin 3 [Source:MGI Symbol;Acc:MGI:107570]	+	gene	Name match	4.233438	1.5682999
Fitm1	14	55575617	55576954	+	Fitm1	ENSMUSG00000022215	fat storage-inducing transmembrane protein 1 [Source:MGI Symbol;Acc:MGI:1915930]	+	gene	Name match	4.1784434	1.5370213
Atcayos	10	81194609	81210877	+	Atcayos	ENSMUSG00000085779	ataxia, cerebellar, Cayman type, opposite strand [Source:MGI Symbol;Acc:MGI:1916928]	+	gene	Name match	4.1450295	1.5326638
Klhl31	9	77636500	77660127	+	Klhl31	ENSMUSG00000044938	kelch-like 31 [Source:MGI Symbol;Acc:MGI:3045305]	+	gene	Name match	4.1251917	1.7273712
Tmem182	1	40805601	40856887	+	Tmem182	ENSMUSG00000079588	transmembrane protein 182 [Source:MGI Symbol;Acc:MGI:1923725]	+	gene	Name match	4.1091084	1.6307322
Obscn	11	58994256	59136402	-	Obscn	ENSMUSG00000061462	obscurin, cytoskeletal calmodulin and titin- interacting RhoGEF [Source:MGI Symbol;Acc:MGI:2681862]	-	gene	Name match	4.100353	1.4627026
Jsrp1	10	80808496	80813498	-	Jsrp1	ENSMUSG00000020216	junctional sarcoplasmic reticulum protein 1 [Source:MGI Symbol;Acc:MGI:1916700]	-	gene	Name match	4.0772023	2.2387457
Cavin4	4	48663514	48673502	+	Cavin4	ENSMUSG00000028348	caveolae associated 4 [Source:MGI Symbol;Acc:MGI:1915266]	+	gene	Name match	4.0137854	1.560191
Hspb7	4	141420779	141425311	+	Hspb7	ENSMUSG00000006221	heat shock protein family, member 7 (cardiovascular) [Source:MGI Symbol;Acc:MGI:1352494]	+	gene	Name match	4.0113735	1.872797
Piwil4	9	14696230	14740733	-	Piwil4	ENSMUSG00000036912	piwi-like RNA-mediated gene silencing 4 [Source:MGI Symbol;Acc:MGI:3041167]	-	gene	Name match	3.9173393	1.2546765
Gbp4	5	105115767	105139586	-	Gbp4	ENSMUSG00000079363	guanylate binding protein 4 [Source:MGI Symbol;Acc:MGI:97072]	-	gene	Name match	3.9009335	1.6620811
Mypn	10	63115795	63203952	-	Mypn	ENSMUSG00000020067	myopalladin [Source:MGI Symbol;Acc:MGI:1916052]	-	gene	Name match	3.833132	1.1766785
Dhrs7c	11	67798187	67816002	+	Dhrs7c	ENSMUSG00000033044	dehydrogenase/reductase (SDR family) member 7C [Source:MGI Symbol;Acc:MGI:1915710]	+	gene	Name match	3.8212578	1.3127257
Tmod4	3	95124476	95129209	+	Tmod4	ENSMUSG00000005628	tropomodulin 4 [Source:MGI Symbol;Acc:MGI:1355285]	+	gene	Name match	3.7632558	1.3997828
Camk2a	18	60925618	60988152	+	Camk2a	ENSMUSG00000024617	calcium/calmodulin-dependent protein kinase II alpha [Source:MGI Symbol;Acc:MGI:88256]	+	gene	Name match	3.7615945	1.9082276
Fbp2	13	62836877	62858422	-	Fbp2	ENSMUSG00000021456	fructose bisphosphatase 2 [Source:MGI Symbol;Acc:MGI:95491]	-	gene	Name match	3.7539046	1.1742979
Art1	7	102101743	102113933	+	Art1	ENSMUSG00000030996	ADP-ribosyltransferase 1 [Source:MGI Symbol;Acc:MGI:107511]	+	gene	Name match	3.6770535	0.9062833
Hmcn2	2	31314415	31460738	+	Hmcn2	ENSMUSG00000055632	hemicentin 2 [Source:MGI Symbol;Acc:MGI:2677838]	+	gene	Name match	3.668998	1.7803224
Gm37759	1	136116577	136119822	-	Gm37759	ENSMUSG00000102717	predicted gene, 37759 [Source:MGI Symbol;Acc:MGI:5610987]	-	gene	Name match	3.658001	1.6372976
Sgcg	14	61219115	61258490	-	Sgcg	ENSMUSG00000035296	sarcoglycan, gamma (dystrophin-associated glycoprotein) [Source:MGI Symbol;Acc:MGI:1346524]	-	gene	Name match	3.6402366	1.0326926
Ckmt2	13	91853387	91876885	-	Ckmt2	ENSMUSG00000021622	creatine kinase, mitochondrial 2 [Source:MGI Symbol;Acc:MGI:1923972]	-	gene	Name match	3.630171	0.8835902
Smyd1	6	71213940	71322233	-	Smyd1	ENSMUSG00000055027	SET and MYND domain containing 1 [Source:MGI Symbol;Acc:MGI:104790]	-	gene	Name match	3.5943708	1.3996502
Tnnt1	7	4504570	4516382	-	Tnnt1	ENSMUSG00000064179	troponin T1, skeletal, slow [Source:MGI Symbol;Acc:MGI:1333868]	-	gene	Name match	3.5926695	1.4827223
Lmod3	6	97238534	97252759	-	Lmod3	ENSMUSG00000044086	leiomodin 3 (fetal) [Source:MGI Symbol;Acc:MGI:2444169]	-	gene	Name match	3.527568	0.9856329
1110002E22Rik	3	138065052	138081506	+	1110002E22Rik	ENSMUSG00000090066	RIKEN cDNA 1110002E22 gene [Source:MGI Symbol;Acc:MGI:1915066]	+	gene	Name match	3.5229347	1.227827
Rtl1	12	109589193	109600330	-	Rtl1	ENSMUSG00000085925	retrotransposon Gaglike 1 [Source:MGI Symbol;Acc:MGI:2656842]	-	gene	Name match	3.5128603	1.6970845
Rpl3l	17	24727820	24736143	+	Rpl3l	ENSMUSG00000002500	ribosomal protein L3-like [Source:MGI Symbol;Acc:MGI:1913461]	+	gene	Name match	3.5003052	0.68470806
Gbp2b	3	142594847	142619179	+	Gbp2b	ENSMUSG00000040264	guanylate binding protein 2b [Source:MGI Symbol;Acc:MGI:95666]	+	gene	Name match	3.4459944	1.5280095
Ppp1r3a	6	14713977	14755274	-	Ppp1r3a	ENSMUSG00000042717	protein phosphatase 1, regulatory subunit 3A [Source:MGI Symbol;Acc:MGI:2153588]	-	gene	Name match	3.432337	1.2207707
Klk8	7	43797577	43803826	+	Klk8	ENSMUSG00000064023	kallikrein related-peptidase 8 [Source:MGI Symbol;Acc:MGI:1343327]	+	gene	Name match	3.3492088	1.4467646
Tnnt2	1	135836354	135852260	+	Tnnt2	ENSMUSG00000026414	troponin T2, cardiac [Source:MGI Symbol;Acc:MGI:104597]	+	gene	Name match	3.305336	0.9699723
Xirp1	9	120013755	120023598	-	Xirp1	ENSMUSG00000079243	xin actin-binding repeat containing 1 [Source:MGI Symbol;Acc:MGI:1333878]	-	gene	Name match	3.2169416	0.81996137
Chrna1	2	73563215	73580338	-	Chrna1	ENSMUSG00000027107	cholinergic receptor, nicotinic, alpha polypeptide 1 (muscle) [Source:MGI Symbol;Acc:MGI:87885]	-	gene	Name match	3.1982536	0.5401571
Asb2	12	103321142	103356001	-	Asb2	ENSMUSG00000021200	ankyrin repeat and SOCS box-containing 2 [Source:MGI Symbol;Acc:MGI:1929743]	-	gene	Name match	3.1897268	1.2237269
Pck1	2	173153048	173159273	+	Pck1	ENSMUSG00000027513	phosphoenolpyruvate carboxykinase 1, cytosolic [Source:MGI Symbol;Acc:MGI:97501]	+	gene	Name match	3.1702936	1.100215
Hhatl	9	121784016	121792507	-	Hhatl	ENSMUSG00000032523	hedgehog acyltransferase-like [Source:MGI Symbol;Acc:MGI:1922020]	-	gene	Name match	3.1309252	1.2117059
Klhl40	9	121777607	121783818	+	Klhl40	ENSMUSG00000074001	kelch-like 40 [Source:MGI Symbol;Acc:MGI:1919580]	+	gene	Name match	3.1116276	0.9328232
Myo18b	5	112688876	112896362	-	Myo18b	ENSMUSG00000072720	myosin XVIIIb [Source:MGI Symbol;Acc:MGI:1921626]	-	gene	Name match	3.084533	0.8559819
Igsf1	X	49782536	49797749	-	Igsf1	ENSMUSG00000031111	immunoglobulin superfamily, member 1 [Source:MGI Symbol;Acc:MGI:2147913]	-	gene	Name match	3.0787194	1.1340787
Ephb1	9	101922128	102354693	-	Ephb1	ENSMUSG00000032537	Eph receptor B1 [Source:MGI Symbol;Acc:MGI:1096337]	-	gene	Name match	3.0727623	1.2279277
Mylk2	2	152911352	152923068	+	Mylk2	ENSMUSG00000027470	myosin, light polypeptide kinase 2, skeletal muscle [Source:MGI Symbol;Acc:MGI:2139434]	+	gene	Name match	3.0480812	1.0752155
Rbm24	13	46418434	46431095	+	Rbm24	ENSMUSG00000038132	RNA binding motif protein 24 [Source:MGI Symbol;Acc:MGI:3610364]	+	gene	Name match	3.0163505	0.8675073
Trim72	7	128003949	128011033	+	Trim72	ENSMUSG00000042828	tripartite motif-containing 72 [Source:MGI Symbol;Acc:MGI:3612190]	+	gene	Name match	3.001933	0.45001903

This looks pretty telling. The genes we're looking at here all appear to be strongly related to muscles, so actin, myosin and troponin as well as mitochondrial genes which would be increased in muscle cells. I put the list through gprofiler and get enormously significant hits to muscle pathways (eg Reactome muscle contraction at p=3x10^-37.

Given that these samples are supposed to be Schwann cells, it doesn't make sense for them to be expressing these kinds of muscle markers at this level. It would seem to be much more plausible that what has actually happened is that the samples have been contaminated to varying degrees with muscle cells from the surrounding tissue, and that the amount of contamination varies between the two groups.

If this supposition is correct then it's also likely that the amount of contamination will vary within each group since it will not relate to the genotype but will be a function of the dissection or sample preparation. We can try to assess this by looking at whether the muscle related genes are unusually variable within the WT or cKO groups. We can do this with a variation plot, plotting STDEV against expression level for each of the two groups.

It is clear that in both the WT and cKO groups that these muscle markers are unusually variable, suggesting that an external contamination is a viable explanation. We can also clearly see the difference in expression level between the two groups which led us to look at these genes in the first place.

An examination of these samples would ideally have shown us that they contained good quality data and exhibited constant effects of the type we expect given the experimental description. Unfortunately an examination of the data instead raises serious concerns about whether these samples are indeed valid and we should probably not proceed with this analysis until the issues raised are sorted out.

Specifically there are 3 major concerns:

1. The nature of the read data looks wrong, in that this was supposed to be an Illumina Tru-Seq library, but the reads coming from it are non-directional.

2. The cKO samples are supposed to have a knockout of the Taz and Yap1 genes and the data shows no evidence of this change. Since it's a conditional knockout then maybe there is more subtelty to the design than a simple reading of the conditions suggests, but this needs to be determined since a lack of the expected effect would mean that any further work on the data would be pointless.

3. It would appear that all of the samples may be contaminated with muscle cells, and unfortuanately that the amount of contamination is not consistent between the two groups, but is systematically higher in cKO than WT. Given the conflation of the contamination and the condition it is going to be very difficult to specifically attribute any changes seen to the knockouts rather than the contamination.

Validation of RNA-Seq knockout

Project Summary

Basic Project Info

Data Sets

Replicate Sets

Basic QC

Visual Inspection

RNA-Seq QC Plot

Duplication

Quantitation

Normalisation

Exploration

Clustering

Comparison

Knockout validation

Differences examination

Conclusion