SeqMonk Vistory

Validation of RNA-Seq knockout

In this example we're going to look at a publicly avaliable dataset in which two genes (Yap1 and Taz) have been knocked out. The data comes from GEO study GSE79115.

The data was processed through the standard clusterflow fastq_hisat2 pipeline which did trimming with TrimGalore and mapping with hisat2. The quality of the data all looked fine and the species screen showed that it was indeed mouse, as expected.

I've imported the data into SeqMonk and cleaned up the sample names a bit, I've also created replicate sets for the two biological conditions (WT and cKO), so it's all ready to go.

Project Summary

Basic Project Info

ParameterValue
Project Namepoitelon_rnaseq.smk
Vistory Namevalidating_knockout.smv
GenomeMus musculus GRCm38_v95
SeqMonk Version1.45.2.devel

Data Sets

NameFile NameReadsImport Options
cKO-1/bi/pubcache/TIDIED/Poitelon_2016/SRR3222409_GSM2085998_YAP_TAZ-DBL-cKO-1_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam44595229Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000
cKO-2/bi/pubcache/TIDIED/Poitelon_2016/SRR3222410_GSM2085999_YAP_TAZ-DBL-cKO-2_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam35385632Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000
cKO-3/bi/pubcache/TIDIED/Poitelon_2016/SRR3222411_GSM2086000_YAP_TAZ-DBL-cKO-3_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam43511442Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000
WT-1/bi/pubcache/TIDIED/Poitelon_2016/SRR3222412_GSM2086001_YAP_TAZ-WT-1_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam66444258Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000
WT-2/bi/pubcache/TIDIED/Poitelon_2016/SRR3222413_GSM2086002_YAP_TAZ-WT-2_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam56949469Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000
WT-3/bi/pubcache/TIDIED/Poitelon_2016/SRR3222414_GSM2086003_YAP_TAZ-WT-3_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam58610561Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000

Replicate Sets

cKOWT
cKO-1WT-1
cKO-2WT-2
cKO-3WT-3

Basic QC

Visual Inspection

The data looks pretty good on a basic inspection. The reads are nicely restricted to exons and cover the whole length of the transcripts. Oddly the library is non-directional. The paper says that the library prep used the Illumina Tru-Seq kit, which should give a directional library, so something a bit odd is going on.

There is no obvious DNA contamination, but there is maybe a suggestion of a small amount of PCR duplication since isolated reads often appear as duplets.

RNA-Seq QC Plot

This all looks good. The library is very clean with almost all reads being in exons, but it was polyA selected so this is expected. We see no rRNA, and only a small amount of MT. There is a bit of variation in the library sizes, and this does seem to link to the condition, with all of the WT samples having more, but it's still less than 2fold different overall so I'm not too concerned. The directionality is still lacking which matches what we saw in the raw data.

Duplication

New Probe Set (411632 probes)

Feature generator using mRNA split into exons duplicates removed Over feature from 0-0

Whilst there is duplication in the samples it is in line with the local data density, so it seems that this is biological in nature rather than technical. There is nothing to worry about overall in terms of the duplication rates in the samples.

Quantitation

We'll do a standard gene level quantitation. We'll use the default log2FPM (correcting for library size but not for transcript length) since we're only interested in comparing the same transcript between samples - we don't care about comapring different transcripts in the same sample.

New Probe Set (35266 probes)

Transcript features over mRNA

Probes Quantitated

RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library

Normalisation

The default normalisation looks pretty good, but it might improve slightly if we add in size factor normalisation, so let's try that. We'll remove all of the obviously unexpressed genes first.

New Probe List: Expressed somewhere (19140 probes)

Filter on probes in All Probes where at least 1 of cKO-1 , cKO-2 , cKO-3 , WT-1 , WT-2 , WT-3 had a value above 50.0. Quantitation was RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library

Probes Quantitated

RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library transformed by size factor normalisation using Add using the Expressed somewhere (19140) probe list

It's slightly nicer with the additional normalisation so we'll keep that.

Exploration

Clustering

We can start by using PCA to see how the samples cluster.

The samples form two obvious clusters and these split by the knockout status on PC1 so this all looks good. The cKO-1 sample seems a little divergent on PC2 so that might need looking at.

Comparison

There are some very obvious differences between the samples, but the structure of the differences looks odd. There is a pretty obvious group of genes which appear to be very consistently upregulated in the knockout sample, all changing by a log2 diff of around 2.5 over a wide range of expression levels. This doesn't look normal and suggests that there may be an additional effect on these samples beyond the knockout, possibly a copy number change? This asymmetry in the changes probably explains why the size factor normalisation produced some benefit.

There are some genes which show quite large expression decreases in the cKO samples, but nothing looks like it really moves from being highly expressed to being absent, so we should check on the knockouts.

Knockout validation

Since the cKO samples are supposed to be a double knockout of Yap1 and Taz we should look at those genes specifically.

New Probe List: Yap1 (1 probes)

Probes from Expressed somewhere whose name matches any of Yap1 after stripping probe generator suffixes

New Probe List: Taz (1 probes)

Probes from Expressed somewhere whose name matches any of Taz after stripping probe generator suffixes

That doesn't look good. Neither of the genes which are supposed to be knocked out actually appear to show any real change in the cKO group. Depending on how the knockout was done this might not be an issue. It it's very tightly targetted then it could be that the transcripts are still being produced but in a non-functional form.

I went and looked up the paper for the original article where the Yap1/Taz knockout was created ((PMC3605093) and in both cases they used a Cre/Lox system to completely excise the first complete coding exon (exon 2). It's possible that after deletion the transcript splices around the missing exon and goes on, but we should check.

The second exon of Yap1 certainly appears to still be there, and at a similar level to the first exon. The total number of reads is fewer, but that's in line with the lower coverage overall in the cKO samples (which we saw in the QC report).

Likewise the second Taz exon also appears to be present at levels which are consistent with the exons around it. Given these observations I would be very cautious of proceeding with the analysis without going back to validate the exact knockouts in more detail, because on the face of it it looks like the expected effects are absent from these samples.

Differences examination

Despite the lack of expected knockouts in the samples we did definitely see some consistent expression differences between the two groups, but we also saw the oddly consistently regulated set of genes. To try to understand what's going on with these we can see if we can find any obvious connection between the genes in this set.

To isolate these genes we can do a crude extraction of them based on their fold change. I can just manually select these in a scatterplot.

New Probe List: Oddly similar changes (511 probes)

Filter on probes in Expressed somewhere where average difference when comparing WT to cKO had a value between 1.8 and 3.0. Quantitation was RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library transformed by size factor normalisation using Add using the Expressed somewhere (19140) probe list

The most obvious thing to connect these points would be phyiscal position in the genome, so let's see where they are:

Meh - they're all over the place so bang goes that theory. Let's have a look at what the actual genes are instead. We'll just take the ones which are more highly expressed to limit how many we have to deal with.

New Probe List: Highly expressed (108 probes)

Filter on probes in Oddly similar changes where exactly 1 of cKO had a value above 3.0. Quantitation was RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library transformed by size factor normalisation using Add using the Expressed somewhere (19140) probe list

ProbeChromosomeStartEndProbe StrandFeatureIDDescriptionFeature StrandTypeFeature OrientationDistancecKOWT
Acta18123891769123894751-Acta1ENSMUSG00000031972actin, alpha 1, skeletal muscle [Source:MGI Symbol;Acc:MGI:87902]-geneName match010.63007458.097512
Mylpf7127208890127214298+MylpfENSMUSG00000030672myosin light chain, phosphorylatable, fast skeletal muscle [Source:MGI Symbol;Acc:MGI:97273]+geneName match010.001747.4619308
Myl116692429566945404-Myl1ENSMUSG00000061816myosin, light polypeptide 1 [Source:MGI Symbol;Acc:MGI:97269]-geneName match09.5556137.085666
Myh8116727712467308634+Myh8ENSMUSG00000055775myosin, heavy polypeptide 8, skeletal muscle, perinatal [Source:MGI Symbol;Acc:MGI:1339712]+geneName match09.3140486.725017
Tnnt37142498836142516009+Tnnt3ENSMUSG00000061723troponin T3, skeletal, fast [Source:MGI Symbol;Acc:MGI:109550]+geneName match09.1287856.6186385
Ube2c2164769898164778822+Ube2cENSMUSG00000001403ubiquitin-conjugating enzyme E2C [Source:MGI Symbol;Acc:MGI:1915862]+geneName match09.0951467.261156
Tnnc22164777161164779967-Tnnc2ENSMUSG00000017300troponin C2, fast [Source:MGI Symbol;Acc:MGI:98780]-geneName match08.95173556.5184937
Ckm71940477619422841+CkmENSMUSG00000030399creatine kinase, muscle [Source:MGI Symbol;Acc:MGI:88413]+geneName match08.9464256.266685
Tpm244351471143523765-Tpm2ENSMUSG00000028464tropomyosin 2, beta [Source:MGI Symbol;Acc:MGI:98810]-geneName match08.7917836.863115
Tnni27142441808142444410+Tnni2ENSMUSG00000031097troponin I, skeletal, fast 2 [Source:MGI Symbol;Acc:MGI:105070]+geneName match08.4415715.8975697
Atp2a17126445858126463108-Atp2a1ENSMUSG00000030730ATPase, Ca++ transporting, cardiac muscle, fast twitch 1 [Source:MGI Symbol;Acc:MGI:105058]-geneName match07.97812035.1985235
A930016O22Rik71941746619421417-A930016O22RikENSMUSG00000040705RIKEN cDNA A930016O22 gene [Source:MGI Symbol;Acc:MGI:3605623]-geneName match07.77303465.131559
Eno3117065720270662513+Eno3ENSMUSG00000060600enolase 3, beta muscle [Source:MGI Symbol;Acc:MGI:95395]+geneName match07.5047615.2885814
Des17536032975368579+DesENSMUSG00000026208desmin [Source:MGI Symbol;Acc:MGI:94885]+geneName match07.40289075.2280498
Col2a1159797560298004695-Col2a1ENSMUSG00000022483collagen, type II, alpha 1 [Source:MGI Symbol;Acc:MGI:88452]-geneName match07.1563765.3229785
Ttn27670398076982547-TtnENSMUSG00000051747titin [Source:MGI Symbol;Acc:MGI:98864]-geneName match07.09317644.4328256
Actc12114047282114053548-Actc1ENSMUSG00000068614actin, alpha, cardiac muscle 1 [Source:MGI Symbol;Acc:MGI:87905]-geneName match06.891433.9702747
Casq11172209894172219868-Casq1ENSMUSG00000007122calsequestrin 1 [Source:MGI Symbol;Acc:MGI:1309468]-geneName match06.85231354.718498
Pvalb157819111478206400-PvalbENSMUSG00000005716parvalbumin [Source:MGI Symbol;Acc:MGI:97821]-geneName match06.77983434.195829
Tceal7X136214779136226100+Tceal7ENSMUSG00000079428transcription elongation factor A (SII)-like 7 [Source:MGI Symbol;Acc:MGI:1915746]+geneName match06.74508334.525802
Actn31948612234877909-Actn3ENSMUSG00000006457actinin alpha 3 [Source:MGI Symbol;Acc:MGI:99678]-geneName match06.6991884.280667
Pygm1963843996398459+PygmENSMUSG00000032648muscle glycogen phosphorylase [Source:MGI Symbol;Acc:MGI:97830]+geneName match06.449034.334618
Mybpc1108851827988605152-Mybpc1ENSMUSG00000020061myosin binding protein C, slow-type [Source:MGI Symbol;Acc:MGI:1336213]-geneName match06.30107553.8659647
Pgam21158016405803733-Pgam2ENSMUSG00000020475phosphoglycerate mutase 2 [Source:MGI Symbol;Acc:MGI:1933118]-geneName match06.25718933.6856563
Neb25213664752378474-NebENSMUSG00000026950nebulin [Source:MGI Symbol;Acc:MGI:97292]-geneName match06.16072134.0120506
Myoz1142064910720656540-Myoz1ENSMUSG00000068697myozenin 1 [Source:MGI Symbol;Acc:MGI:1929471]-geneName match06.11562733.6680744
Fibin2110360917110363183-FibinENSMUSG00000074971fin bud initiation factor homolog (zebrafish) [Source:MGI Symbol;Acc:MGI:1914856]-geneName match06.08459853.5646398
Ldb3143452660334588682-Ldb3ENSMUSG00000021798LIM domain binding 3 [Source:MGI Symbol;Acc:MGI:1344412]-geneName match06.0324373.6439762
Actn2131226942612340760-Actn2ENSMUSG00000052374actinin alpha 2 [Source:MGI Symbol;Acc:MGI:109192]-geneName match06.02240133.2486563
Srl1644802164541816-SrlENSMUSG00000022519sarcalumenin [Source:MGI Symbol;Acc:MGI:2146620]-geneName match06.0160254.061566
Sln95385016453854560+SlnENSMUSG00000042045sarcolipin [Source:MGI Symbol;Acc:MGI:1913652]+geneName match06.00151063.0297756
Sypl23108211472108226648-Sypl2ENSMUSG00000027887synaptophysin-like 2 [Source:MGI Symbol;Acc:MGI:1328311]-geneName match05.70190433.2455757
Myl411104550663104595753+Myl4ENSMUSG00000061086myosin, light polypeptide 4 [Source:MGI Symbol;Acc:MGI:97267]+geneName match05.68410833.326821
Trdn103308055433476709+TrdnENSMUSG00000019787triadin [Source:MGI Symbol;Acc:MGI:1924007]+geneName match05.6291873.2385871
Mybpc274450169944524656-Mybpc2ENSMUSG00000038670myosin binding protein C, fast-type [Source:MGI Symbol;Acc:MGI:1336170]-geneName match05.5803162.735296
Itgb1bp2X101449088101453541+Itgb1bp2ENSMUSG00000031312integrin beta 1 binding protein 2 [Source:MGI Symbol;Acc:MGI:1353420]+geneName match05.54330062.9828875
Atp1b4X3831618438336769+Atp1b4ENSMUSG00000016327ATPase, (Na+)/K+ transporting, beta 4 polypeptide [Source:MGI Symbol;Acc:MGI:1915071]+geneName match05.38332133.1316712
Rtn271928262419296160+Rtn2ENSMUSG00000030401reticulon 2 (Z-band associated protein) [Source:MGI Symbol;Acc:MGI:107612]+geneName match05.37807853.4236689
Myom1177100263371126856+Myom1ENSMUSG00000024049myomesin 1 [Source:MGI Symbol;Acc:MGI:1341430]+geneName match05.33543063.0266078
Cox6a27128205435128206387-Cox6a2ENSMUSG00000030785cytochrome c oxidase subunit 6A2 [Source:MGI Symbol;Acc:MGI:104649]-geneName match05.3101412.7472584
Apobec2174841923148432930-Apobec2ENSMUSG00000040694apolipoprotein B mRNA editing enzyme, catalytic polypeptide 2 [Source:MGI Symbol;Acc:MGI:1343178]-geneName match05.2878722.7192059
Pdlim384588546145919548+Pdlim3ENSMUSG00000031636PDZ and LIM domain 3 [Source:MGI Symbol;Acc:MGI:1859274]+geneName match05.2738423.42713
SmpxX157698910157752591+SmpxENSMUSG00000041476small muscle protein, X-linked [Source:MGI Symbol;Acc:MGI:1913356]+geneName match05.25846862.851238
Myom281505765315133541+Myom2ENSMUSG00000031461myomesin 2 [Source:MGI Symbol;Acc:MGI:1328358]+geneName match04.98354962.4547803
Eef1a22181147653181157014-Eef1a2ENSMUSG00000016349eukaryotic translation elongation factor 1 alpha 2 [Source:MGI Symbol;Acc:MGI:1096317]-geneName match04.96148442.9759362
Iigp1186037602960392627+Iigp1ENSMUSG00000054072interferon inducible GTPase 1 [Source:MGI Symbol;Acc:MGI:1926259]+geneName match04.9527992.288293
Klhl4126967012069684230+Klhl41ENSMUSG00000075307kelch-like 41 [Source:MGI Symbol;Acc:MGI:2683854]+geneName match04.93761252.5512183
Myot184433407444355724+MyotENSMUSG00000024471myotilin [Source:MGI Symbol;Acc:MGI:1889800]+geneName match04.8663522.3614779
Hrc74533529045338974+HrcENSMUSG00000038239histidine rich calcium binding protein [Source:MGI Symbol;Acc:MGI:96226]+geneName match04.85311272.388653
Ank182297484423150497+Ank1ENSMUSG00000031543ankyrin 1, erythroid [Source:MGI Symbol;Acc:MGI:88024]+geneName match04.84270242.5067291
Tcap119838381198384953+TcapENSMUSG00000007877titin-cap [Source:MGI Symbol;Acc:MGI:1330233]+geneName match04.80940062.4194448
Ryr172900334429125179-Ryr1ENSMUSG00000030592ryanodine receptor 1, skeletal muscle [Source:MGI Symbol;Acc:MGI:99659]-geneName match04.80115942.4239838
Hfe239652517296529210+Hfe2ENSMUSG00000038403hemochromatosis type 2 (juvenile) [Source:MGI Symbol;Acc:MGI:1916835]+geneName match04.7304962.3208005
Ampd13103074014103099720+Ampd1ENSMUSG00000070385adenosine monophosphate deaminase 1 [Source:MGI Symbol;Acc:MGI:88015]+geneName match04.6632212.0056393
Myoz23123006206123035015-Myoz2ENSMUSG00000028116myozenin 2 [Source:MGI Symbol;Acc:MGI:1913063]-geneName match04.6617342.3122559
Cacng111107703218107716522-Cacng1ENSMUSG00000020722calcium channel, voltage-dependent, gamma subunit 1 [Source:MGI Symbol;Acc:MGI:1206582]-geneName match04.6540352.1101506
Sh3bgr169620045096228935+Sh3bgrENSMUSG00000040666SH3-binding domain glutamic acid-rich protein [Source:MGI Symbol;Acc:MGI:1354740]+geneName match04.5953511.9935116
Cox8b7140898945140900446-Cox8bENSMUSG00000025488cytochrome c oxidase subunit 8B [Source:MGI Symbol;Acc:MGI:105958]-geneName match04.56830452.1308832
Fabp34130308595130315463+Fabp3ENSMUSG00000028773fatty acid binding protein 3, muscle and heart [Source:MGI Symbol;Acc:MGI:95476]+geneName match04.54983662.3100097
Arhgap36X4946394549500244+Arhgap36ENSMUSG00000036198Rho GTPase activating protein 36 [Source:MGI Symbol;Acc:MGI:1922654]+geneName match04.4990452.1075962
Cilp96526518065280605+CilpENSMUSG00000042254cartilage intermediate layer protein, nucleotide pyrophosphohydrolase [Source:MGI Symbol;Acc:MGI:2444507]+geneName match04.46442272.4463303
Myh3116707830067102291+Myh3ENSMUSG00000020908myosin, heavy polypeptide 3, skeletal muscle, embryonic [Source:MGI Symbol;Acc:MGI:1339709]+geneName match04.44035671.8177018
Nrap195632003556390037-NrapENSMUSG00000049134nebulin-related anchoring protein [Source:MGI Symbol;Acc:MGI:1098765]-geneName match04.34821842.001753
Synpo2l142065894620668354-Synpo2lENSMUSG00000039376synaptopodin 2-like [Source:MGI Symbol;Acc:MGI:1916010]-geneName match04.33794832.1240387
Cacna1s1136052750136119822+Cacna1sENSMUSG00000026407calcium channel, voltage-dependent, L type, alpha 1S subunit [Source:MGI Symbol;Acc:MGI:88294]+geneName match04.26398472.1601298
Cav36112459505112472872+Cav3ENSMUSG00000062694caveolin 3 [Source:MGI Symbol;Acc:MGI:107570]+geneName match04.2334381.5682999
Fitm1145557561755576954+Fitm1ENSMUSG00000022215fat storage-inducing transmembrane protein 1 [Source:MGI Symbol;Acc:MGI:1915930]+geneName match04.17844341.5370213
Atcayos108119460981210877+AtcayosENSMUSG00000085779ataxia, cerebellar, Cayman type, opposite strand [Source:MGI Symbol;Acc:MGI:1916928]+geneName match04.14502951.5326638
Klhl3197763650077660127+Klhl31ENSMUSG00000044938kelch-like 31 [Source:MGI Symbol;Acc:MGI:3045305]+geneName match04.12519171.7273712
Tmem18214080560140856887+Tmem182ENSMUSG00000079588transmembrane protein 182 [Source:MGI Symbol;Acc:MGI:1923725]+geneName match04.10910841.6307322
Obscn115899425659136402-ObscnENSMUSG00000061462obscurin, cytoskeletal calmodulin and titin- interacting RhoGEF [Source:MGI Symbol;Acc:MGI:2681862]-geneName match04.1003531.4627026
Jsrp1108080849680813498-Jsrp1ENSMUSG00000020216junctional sarcoplasmic reticulum protein 1 [Source:MGI Symbol;Acc:MGI:1916700]-geneName match04.07720232.2387457
Cavin444866351448673502+Cavin4ENSMUSG00000028348caveolae associated 4 [Source:MGI Symbol;Acc:MGI:1915266]+geneName match04.01378541.560191
Hspb74141420779141425311+Hspb7ENSMUSG00000006221heat shock protein family, member 7 (cardiovascular) [Source:MGI Symbol;Acc:MGI:1352494]+geneName match04.01137351.872797
Piwil491469623014740733-Piwil4ENSMUSG00000036912piwi-like RNA-mediated gene silencing 4 [Source:MGI Symbol;Acc:MGI:3041167]-geneName match03.91733931.2546765
Gbp45105115767105139586-Gbp4ENSMUSG00000079363guanylate binding protein 4 [Source:MGI Symbol;Acc:MGI:97072]-geneName match03.90093351.6620811
Mypn106311579563203952-MypnENSMUSG00000020067myopalladin [Source:MGI Symbol;Acc:MGI:1916052]-geneName match03.8331321.1766785
Dhrs7c116779818767816002+Dhrs7cENSMUSG00000033044dehydrogenase/reductase (SDR family) member 7C [Source:MGI Symbol;Acc:MGI:1915710]+geneName match03.82125781.3127257
Tmod439512447695129209+Tmod4ENSMUSG00000005628tropomodulin 4 [Source:MGI Symbol;Acc:MGI:1355285]+geneName match03.76325581.3997828
Camk2a186092561860988152+Camk2aENSMUSG00000024617calcium/calmodulin-dependent protein kinase II alpha [Source:MGI Symbol;Acc:MGI:88256]+geneName match03.76159451.9082276
Fbp2136283687762858422-Fbp2ENSMUSG00000021456fructose bisphosphatase 2 [Source:MGI Symbol;Acc:MGI:95491]-geneName match03.75390461.1742979
Art17102101743102113933+Art1ENSMUSG00000030996ADP-ribosyltransferase 1 [Source:MGI Symbol;Acc:MGI:107511]+geneName match03.67705350.9062833
Hmcn223131441531460738+Hmcn2ENSMUSG00000055632hemicentin 2 [Source:MGI Symbol;Acc:MGI:2677838]+geneName match03.6689981.7803224
Gm377591136116577136119822-Gm37759ENSMUSG00000102717predicted gene, 37759 [Source:MGI Symbol;Acc:MGI:5610987]-geneName match03.6580011.6372976
Sgcg146121911561258490-SgcgENSMUSG00000035296sarcoglycan, gamma (dystrophin-associated glycoprotein) [Source:MGI Symbol;Acc:MGI:1346524]-geneName match03.64023661.0326926
Ckmt2139185338791876885-Ckmt2ENSMUSG00000021622creatine kinase, mitochondrial 2 [Source:MGI Symbol;Acc:MGI:1923972]-geneName match03.6301710.8835902
Smyd167121394071322233-Smyd1ENSMUSG00000055027SET and MYND domain containing 1 [Source:MGI Symbol;Acc:MGI:104790]-geneName match03.59437081.3996502
Tnnt1745045704516382-Tnnt1ENSMUSG00000064179troponin T1, skeletal, slow [Source:MGI Symbol;Acc:MGI:1333868]-geneName match03.59266951.4827223
Lmod369723853497252759-Lmod3ENSMUSG00000044086leiomodin 3 (fetal) [Source:MGI Symbol;Acc:MGI:2444169]-geneName match03.5275680.9856329
1110002E22Rik3138065052138081506+1110002E22RikENSMUSG00000090066RIKEN cDNA 1110002E22 gene [Source:MGI Symbol;Acc:MGI:1915066]+geneName match03.52293471.227827
Rtl112109589193109600330-Rtl1ENSMUSG00000085925retrotransposon Gaglike 1 [Source:MGI Symbol;Acc:MGI:2656842]-geneName match03.51286031.6970845
Rpl3l172472782024736143+Rpl3lENSMUSG00000002500ribosomal protein L3-like [Source:MGI Symbol;Acc:MGI:1913461]+geneName match03.50030520.68470806
Gbp2b3142594847142619179+Gbp2bENSMUSG00000040264guanylate binding protein 2b [Source:MGI Symbol;Acc:MGI:95666]+geneName match03.44599441.5280095
Ppp1r3a61471397714755274-Ppp1r3aENSMUSG00000042717protein phosphatase 1, regulatory subunit 3A [Source:MGI Symbol;Acc:MGI:2153588]-geneName match03.4323371.2207707
Klk874379757743803826+Klk8ENSMUSG00000064023kallikrein related-peptidase 8 [Source:MGI Symbol;Acc:MGI:1343327]+geneName match03.34920881.4467646
Tnnt21135836354135852260+Tnnt2ENSMUSG00000026414troponin T2, cardiac [Source:MGI Symbol;Acc:MGI:104597]+geneName match03.3053360.9699723
Xirp19120013755120023598-Xirp1ENSMUSG00000079243xin actin-binding repeat containing 1 [Source:MGI Symbol;Acc:MGI:1333878]-geneName match03.21694160.81996137
Chrna127356321573580338-Chrna1ENSMUSG00000027107cholinergic receptor, nicotinic, alpha polypeptide 1 (muscle) [Source:MGI Symbol;Acc:MGI:87885]-geneName match03.19825360.5401571
Asb212103321142103356001-Asb2ENSMUSG00000021200ankyrin repeat and SOCS box-containing 2 [Source:MGI Symbol;Acc:MGI:1929743]-geneName match03.18972681.2237269
Pck12173153048173159273+Pck1ENSMUSG00000027513phosphoenolpyruvate carboxykinase 1, cytosolic [Source:MGI Symbol;Acc:MGI:97501]+geneName match03.17029361.100215
Hhatl9121784016121792507-HhatlENSMUSG00000032523hedgehog acyltransferase-like [Source:MGI Symbol;Acc:MGI:1922020]-geneName match03.13092521.2117059
Klhl409121777607121783818+Klhl40ENSMUSG00000074001kelch-like 40 [Source:MGI Symbol;Acc:MGI:1919580]+geneName match03.11162760.9328232
Myo18b5112688876112896362-Myo18bENSMUSG00000072720myosin XVIIIb [Source:MGI Symbol;Acc:MGI:1921626]-geneName match03.0845330.8559819
Igsf1X4978253649797749-Igsf1ENSMUSG00000031111immunoglobulin superfamily, member 1 [Source:MGI Symbol;Acc:MGI:2147913]-geneName match03.07871941.1340787
Ephb19101922128102354693-Ephb1ENSMUSG00000032537Eph receptor B1 [Source:MGI Symbol;Acc:MGI:1096337]-geneName match03.07276231.2279277
Mylk22152911352152923068+Mylk2ENSMUSG00000027470myosin, light polypeptide kinase 2, skeletal muscle [Source:MGI Symbol;Acc:MGI:2139434]+geneName match03.04808121.0752155
Rbm24134641843446431095+Rbm24ENSMUSG00000038132RNA binding motif protein 24 [Source:MGI Symbol;Acc:MGI:3610364]+geneName match03.01635050.8675073
Trim727128003949128011033+Trim72ENSMUSG00000042828tripartite motif-containing 72 [Source:MGI Symbol;Acc:MGI:3612190]+geneName match03.0019330.45001903

This looks pretty telling. The genes we're looking at here all appear to be strongly related to muscles, so actin, myosin and troponin as well as mitochondrial genes which would be increased in muscle cells. I put the list through gprofiler and get enormously significant hits to muscle pathways (eg Reactome muscle contraction at p=3x10^-37.

Given that these samples are supposed to be Schwann cells, it doesn't make sense for them to be expressing these kinds of muscle markers at this level. It would seem to be much more plausible that what has actually happened is that the samples have been contaminated to varying degrees with muscle cells from the surrounding tissue, and that the amount of contamination varies between the two groups.

If this supposition is correct then it's also likely that the amount of contamination will vary within each group since it will not relate to the genotype but will be a function of the dissection or sample preparation. We can try to assess this by looking at whether the muscle related genes are unusually variable within the WT or cKO groups. We can do this with a variation plot, plotting STDEV against expression level for each of the two groups.

It is clear that in both the WT and cKO groups that these muscle markers are unusually variable, suggesting that an external contamination is a viable explanation. We can also clearly see the difference in expression level between the two groups which led us to look at these genes in the first place.

Conclusion

An examination of these samples would ideally have shown us that they contained good quality data and exhibited constant effects of the type we expect given the experimental description. Unfortunately an examination of the data instead raises serious concerns about whether these samples are indeed valid and we should probably not proceed with this analysis until the issues raised are sorted out.

Specifically there are 3 major concerns:

1. The nature of the read data looks wrong, in that this was supposed to be an Illumina Tru-Seq library, but the reads coming from it are non-directional.

2. The cKO samples are supposed to have a knockout of the Taz and Yap1 genes and the data shows no evidence of this change. Since it's a conditional knockout then maybe there is more subtelty to the design than a simple reading of the conditions suggests, but this needs to be determined since a lack of the expected effect would mean that any further work on the data would be pointless.

3. It would appear that all of the samples may be contaminated with muscle cells, and unfortuanately that the amount of contamination is not consistent between the two groups, but is systematically higher in cKO than WT. Given the conflation of the contamination and the condition it is going to be very difficult to specifically attribute any changes seen to the knockouts rather than the contamination.