In this example we're going to look at a publicly avaliable dataset in which two genes (Yap1 and Taz) have been knocked out. The data comes from GEO study GSE79115.
The data was processed through the standard clusterflow fastq_hisat2 pipeline which did trimming with TrimGalore and mapping with hisat2. The quality of the data all looked fine and the species screen showed that it was indeed mouse, as expected.
I've imported the data into SeqMonk and cleaned up the sample names a bit, I've also created replicate sets for the two biological conditions (WT and cKO), so it's all ready to go.
Parameter | Value |
---|---|
Project Name | poitelon_rnaseq.smk |
Vistory Name | validating_knockout.smv |
Genome | Mus musculus GRCm38_v95 |
SeqMonk Version | 1.45.2.devel |
Name | File Name | Reads | Import Options |
---|---|---|---|
cKO-1 | /bi/pubcache/TIDIED/Poitelon_2016/SRR3222409_GSM2085998_YAP_TAZ-DBL-cKO-1_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam | 44595229 | Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000 |
cKO-2 | /bi/pubcache/TIDIED/Poitelon_2016/SRR3222410_GSM2085999_YAP_TAZ-DBL-cKO-2_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam | 35385632 | Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000 |
cKO-3 | /bi/pubcache/TIDIED/Poitelon_2016/SRR3222411_GSM2086000_YAP_TAZ-DBL-cKO-3_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam | 43511442 | Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000 |
WT-1 | /bi/pubcache/TIDIED/Poitelon_2016/SRR3222412_GSM2086001_YAP_TAZ-WT-1_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam | 66444258 | Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000 |
WT-2 | /bi/pubcache/TIDIED/Poitelon_2016/SRR3222413_GSM2086002_YAP_TAZ-WT-2_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam | 56949469 | Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000 |
WT-3 | /bi/pubcache/TIDIED/Poitelon_2016/SRR3222414_GSM2086003_YAP_TAZ-WT-3_RNA-seq_Mus_musculus_RNA-Seq_1_val_1_GRCm38_hisat2.bam | 58610561 | Library type Paired End Dedup=No MAPQ>=20 Primary alignments only. Treat as RNA-Seq dataPaired distance cutoff = 1000 |
cKO | WT |
---|---|
cKO-1 | WT-1 |
cKO-2 | WT-2 |
cKO-3 | WT-3 |
The data looks pretty good on a basic inspection. The reads are nicely restricted to exons and cover the whole length of the transcripts. Oddly the library is non-directional. The paper says that the library prep used the Illumina Tru-Seq kit, which should give a directional library, so something a bit odd is going on.
There is no obvious DNA contamination, but there is maybe a suggestion of a small amount of PCR duplication since isolated reads often appear as duplets.
This all looks good. The library is very clean with almost all reads being in exons, but it was polyA selected so this is expected. We see no rRNA, and only a small amount of MT. There is a bit of variation in the library sizes, and this does seem to link to the condition, with all of the WT samples having more, but it's still less than 2fold different overall so I'm not too concerned. The directionality is still lacking which matches what we saw in the raw data.
New Probe Set (411632 probes)
Feature generator using mRNA split into exons duplicates removed Over feature from 0-0
Whilst there is duplication in the samples it is in line with the local data density, so it seems that this is biological in nature rather than technical. There is nothing to worry about overall in terms of the duplication rates in the samples.
We'll do a standard gene level quantitation. We'll use the default log2FPM (correcting for library size but not for transcript length) since we're only interested in comparing the same transcript between samples - we don't care about comapring different transcripts in the same sample.
New Probe Set (35266 probes)
Transcript features over mRNA
Probes Quantitated
RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library
The default normalisation looks pretty good, but it might improve slightly if we add in size factor normalisation, so let's try that. We'll remove all of the obviously unexpressed genes first.
New Probe List: Expressed somewhere (19140 probes)
Filter on probes in All Probes where at least 1 of cKO-1 , cKO-2 , cKO-3 , WT-1 , WT-2 , WT-3 had a value above 50.0. Quantitation was RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library
Probes Quantitated
RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library transformed by size factor normalisation using Add using the Expressed somewhere (19140) probe list
It's slightly nicer with the additional normalisation so we'll keep that.
We can start by using PCA to see how the samples cluster.
The samples form two obvious clusters and these split by the knockout status on PC1 so this all looks good. The cKO-1 sample seems a little divergent on PC2 so that might need looking at.
There are some very obvious differences between the samples, but the structure of the differences looks odd. There is a pretty obvious group of genes which appear to be very consistently upregulated in the knockout sample, all changing by a log2 diff of around 2.5 over a wide range of expression levels. This doesn't look normal and suggests that there may be an additional effect on these samples beyond the knockout, possibly a copy number change? This asymmetry in the changes probably explains why the size factor normalisation produced some benefit.
There are some genes which show quite large expression decreases in the cKO samples, but nothing looks like it really moves from being highly expressed to being absent, so we should check on the knockouts.
Since the cKO samples are supposed to be a double knockout of Yap1 and Taz we should look at those genes specifically.
New Probe List: Yap1 (1 probes)
Probes from Expressed somewhere whose name matches any of Yap1 after stripping probe generator suffixes
New Probe List: Taz (1 probes)
Probes from Expressed somewhere whose name matches any of Taz after stripping probe generator suffixes
That doesn't look good. Neither of the genes which are supposed to be knocked out actually appear to show any real change in the cKO group. Depending on how the knockout was done this might not be an issue. It it's very tightly targetted then it could be that the transcripts are still being produced but in a non-functional form.
I went and looked up the paper for the original article where the Yap1/Taz knockout was created ((PMC3605093) and in both cases they used a Cre/Lox system to completely excise the first complete coding exon (exon 2). It's possible that after deletion the transcript splices around the missing exon and goes on, but we should check.
The second exon of Yap1 certainly appears to still be there, and at a similar level to the first exon. The total number of reads is fewer, but that's in line with the lower coverage overall in the cKO samples (which we saw in the QC report).
Likewise the second Taz exon also appears to be present at levels which are consistent with the exons around it. Given these observations I would be very cautious of proceeding with the analysis without going back to validate the exact knockouts in more detail, because on the face of it it looks like the expected effects are absent from these samples.
Despite the lack of expected knockouts in the samples we did definitely see some consistent expression differences between the two groups, but we also saw the oddly consistently regulated set of genes. To try to understand what's going on with these we can see if we can find any obvious connection between the genes in this set.
To isolate these genes we can do a crude extraction of them based on their fold change. I can just manually select these in a scatterplot.
New Probe List: Oddly similar changes (511 probes)
Filter on probes in Expressed somewhere where average difference when comparing WT to cKO had a value between 1.8 and 3.0. Quantitation was RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library transformed by size factor normalisation using Add using the Expressed somewhere (19140) probe list
The most obvious thing to connect these points would be phyiscal position in the genome, so let's see where they are:
Meh - they're all over the place so bang goes that theory. Let's have a look at what the actual genes are instead. We'll just take the ones which are more highly expressed to limit how many we have to deal with.
New Probe List: Highly expressed (108 probes)
Filter on probes in Oddly similar changes where exactly 1 of cKO had a value above 3.0. Quantitation was RNA-Seq pipeline quantitation on merged transcripts counting reads over exons. Log transformed. Assuming a Non-strand specific library transformed by size factor normalisation using Add using the Expressed somewhere (19140) probe list
Probe | Chromosome | Start | End | Probe Strand | Feature | ID | Description | Feature Strand | Type | Feature Orientation | Distance | cKO | WT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Acta1 | 8 | 123891769 | 123894751 | - | Acta1 | ENSMUSG00000031972 | actin, alpha 1, skeletal muscle [Source:MGI Symbol;Acc:MGI:87902] | - | gene | Name match | 0 | 10.6300745 | 8.097512 |
Mylpf | 7 | 127208890 | 127214298 | + | Mylpf | ENSMUSG00000030672 | myosin light chain, phosphorylatable, fast skeletal muscle [Source:MGI Symbol;Acc:MGI:97273] | + | gene | Name match | 0 | 10.00174 | 7.4619308 |
Myl1 | 1 | 66924295 | 66945404 | - | Myl1 | ENSMUSG00000061816 | myosin, light polypeptide 1 [Source:MGI Symbol;Acc:MGI:97269] | - | gene | Name match | 0 | 9.555613 | 7.085666 |
Myh8 | 11 | 67277124 | 67308634 | + | Myh8 | ENSMUSG00000055775 | myosin, heavy polypeptide 8, skeletal muscle, perinatal [Source:MGI Symbol;Acc:MGI:1339712] | + | gene | Name match | 0 | 9.314048 | 6.725017 |
Tnnt3 | 7 | 142498836 | 142516009 | + | Tnnt3 | ENSMUSG00000061723 | troponin T3, skeletal, fast [Source:MGI Symbol;Acc:MGI:109550] | + | gene | Name match | 0 | 9.128785 | 6.6186385 |
Ube2c | 2 | 164769898 | 164778822 | + | Ube2c | ENSMUSG00000001403 | ubiquitin-conjugating enzyme E2C [Source:MGI Symbol;Acc:MGI:1915862] | + | gene | Name match | 0 | 9.095146 | 7.261156 |
Tnnc2 | 2 | 164777161 | 164779967 | - | Tnnc2 | ENSMUSG00000017300 | troponin C2, fast [Source:MGI Symbol;Acc:MGI:98780] | - | gene | Name match | 0 | 8.9517355 | 6.5184937 |
Ckm | 7 | 19404776 | 19422841 | + | Ckm | ENSMUSG00000030399 | creatine kinase, muscle [Source:MGI Symbol;Acc:MGI:88413] | + | gene | Name match | 0 | 8.946425 | 6.266685 |
Tpm2 | 4 | 43514711 | 43523765 | - | Tpm2 | ENSMUSG00000028464 | tropomyosin 2, beta [Source:MGI Symbol;Acc:MGI:98810] | - | gene | Name match | 0 | 8.791783 | 6.863115 |
Tnni2 | 7 | 142441808 | 142444410 | + | Tnni2 | ENSMUSG00000031097 | troponin I, skeletal, fast 2 [Source:MGI Symbol;Acc:MGI:105070] | + | gene | Name match | 0 | 8.441571 | 5.8975697 |
Atp2a1 | 7 | 126445858 | 126463108 | - | Atp2a1 | ENSMUSG00000030730 | ATPase, Ca++ transporting, cardiac muscle, fast twitch 1 [Source:MGI Symbol;Acc:MGI:105058] | - | gene | Name match | 0 | 7.9781203 | 5.1985235 |
A930016O22Rik | 7 | 19417466 | 19421417 | - | A930016O22Rik | ENSMUSG00000040705 | RIKEN cDNA A930016O22 gene [Source:MGI Symbol;Acc:MGI:3605623] | - | gene | Name match | 0 | 7.7730346 | 5.131559 |
Eno3 | 11 | 70657202 | 70662513 | + | Eno3 | ENSMUSG00000060600 | enolase 3, beta muscle [Source:MGI Symbol;Acc:MGI:95395] | + | gene | Name match | 0 | 7.504761 | 5.2885814 |
Des | 1 | 75360329 | 75368579 | + | Des | ENSMUSG00000026208 | desmin [Source:MGI Symbol;Acc:MGI:94885] | + | gene | Name match | 0 | 7.4028907 | 5.2280498 |
Col2a1 | 15 | 97975602 | 98004695 | - | Col2a1 | ENSMUSG00000022483 | collagen, type II, alpha 1 [Source:MGI Symbol;Acc:MGI:88452] | - | gene | Name match | 0 | 7.156376 | 5.3229785 |
Ttn | 2 | 76703980 | 76982547 | - | Ttn | ENSMUSG00000051747 | titin [Source:MGI Symbol;Acc:MGI:98864] | - | gene | Name match | 0 | 7.0931764 | 4.4328256 |
Actc1 | 2 | 114047282 | 114053548 | - | Actc1 | ENSMUSG00000068614 | actin, alpha, cardiac muscle 1 [Source:MGI Symbol;Acc:MGI:87905] | - | gene | Name match | 0 | 6.89143 | 3.9702747 |
Casq1 | 1 | 172209894 | 172219868 | - | Casq1 | ENSMUSG00000007122 | calsequestrin 1 [Source:MGI Symbol;Acc:MGI:1309468] | - | gene | Name match | 0 | 6.8523135 | 4.718498 |
Pvalb | 15 | 78191114 | 78206400 | - | Pvalb | ENSMUSG00000005716 | parvalbumin [Source:MGI Symbol;Acc:MGI:97821] | - | gene | Name match | 0 | 6.7798343 | 4.195829 |
Tceal7 | X | 136214779 | 136226100 | + | Tceal7 | ENSMUSG00000079428 | transcription elongation factor A (SII)-like 7 [Source:MGI Symbol;Acc:MGI:1915746] | + | gene | Name match | 0 | 6.7450833 | 4.525802 |
Actn3 | 19 | 4861223 | 4877909 | - | Actn3 | ENSMUSG00000006457 | actinin alpha 3 [Source:MGI Symbol;Acc:MGI:99678] | - | gene | Name match | 0 | 6.699188 | 4.280667 |
Pygm | 19 | 6384399 | 6398459 | + | Pygm | ENSMUSG00000032648 | muscle glycogen phosphorylase [Source:MGI Symbol;Acc:MGI:97830] | + | gene | Name match | 0 | 6.44903 | 4.334618 |
Mybpc1 | 10 | 88518279 | 88605152 | - | Mybpc1 | ENSMUSG00000020061 | myosin binding protein C, slow-type [Source:MGI Symbol;Acc:MGI:1336213] | - | gene | Name match | 0 | 6.3010755 | 3.8659647 |
Pgam2 | 11 | 5801640 | 5803733 | - | Pgam2 | ENSMUSG00000020475 | phosphoglycerate mutase 2 [Source:MGI Symbol;Acc:MGI:1933118] | - | gene | Name match | 0 | 6.2571893 | 3.6856563 |
Neb | 2 | 52136647 | 52378474 | - | Neb | ENSMUSG00000026950 | nebulin [Source:MGI Symbol;Acc:MGI:97292] | - | gene | Name match | 0 | 6.1607213 | 4.0120506 |
Myoz1 | 14 | 20649107 | 20656540 | - | Myoz1 | ENSMUSG00000068697 | myozenin 1 [Source:MGI Symbol;Acc:MGI:1929471] | - | gene | Name match | 0 | 6.1156273 | 3.6680744 |
Fibin | 2 | 110360917 | 110363183 | - | Fibin | ENSMUSG00000074971 | fin bud initiation factor homolog (zebrafish) [Source:MGI Symbol;Acc:MGI:1914856] | - | gene | Name match | 0 | 6.0845985 | 3.5646398 |
Ldb3 | 14 | 34526603 | 34588682 | - | Ldb3 | ENSMUSG00000021798 | LIM domain binding 3 [Source:MGI Symbol;Acc:MGI:1344412] | - | gene | Name match | 0 | 6.032437 | 3.6439762 |
Actn2 | 13 | 12269426 | 12340760 | - | Actn2 | ENSMUSG00000052374 | actinin alpha 2 [Source:MGI Symbol;Acc:MGI:109192] | - | gene | Name match | 0 | 6.0224013 | 3.2486563 |
Srl | 16 | 4480216 | 4541816 | - | Srl | ENSMUSG00000022519 | sarcalumenin [Source:MGI Symbol;Acc:MGI:2146620] | - | gene | Name match | 0 | 6.016025 | 4.061566 |
Sln | 9 | 53850164 | 53854560 | + | Sln | ENSMUSG00000042045 | sarcolipin [Source:MGI Symbol;Acc:MGI:1913652] | + | gene | Name match | 0 | 6.0015106 | 3.0297756 |
Sypl2 | 3 | 108211472 | 108226648 | - | Sypl2 | ENSMUSG00000027887 | synaptophysin-like 2 [Source:MGI Symbol;Acc:MGI:1328311] | - | gene | Name match | 0 | 5.7019043 | 3.2455757 |
Myl4 | 11 | 104550663 | 104595753 | + | Myl4 | ENSMUSG00000061086 | myosin, light polypeptide 4 [Source:MGI Symbol;Acc:MGI:97267] | + | gene | Name match | 0 | 5.6841083 | 3.326821 |
Trdn | 10 | 33080554 | 33476709 | + | Trdn | ENSMUSG00000019787 | triadin [Source:MGI Symbol;Acc:MGI:1924007] | + | gene | Name match | 0 | 5.629187 | 3.2385871 |
Mybpc2 | 7 | 44501699 | 44524656 | - | Mybpc2 | ENSMUSG00000038670 | myosin binding protein C, fast-type [Source:MGI Symbol;Acc:MGI:1336170] | - | gene | Name match | 0 | 5.580316 | 2.735296 |
Itgb1bp2 | X | 101449088 | 101453541 | + | Itgb1bp2 | ENSMUSG00000031312 | integrin beta 1 binding protein 2 [Source:MGI Symbol;Acc:MGI:1353420] | + | gene | Name match | 0 | 5.5433006 | 2.9828875 |
Atp1b4 | X | 38316184 | 38336769 | + | Atp1b4 | ENSMUSG00000016327 | ATPase, (Na+)/K+ transporting, beta 4 polypeptide [Source:MGI Symbol;Acc:MGI:1915071] | + | gene | Name match | 0 | 5.3833213 | 3.1316712 |
Rtn2 | 7 | 19282624 | 19296160 | + | Rtn2 | ENSMUSG00000030401 | reticulon 2 (Z-band associated protein) [Source:MGI Symbol;Acc:MGI:107612] | + | gene | Name match | 0 | 5.3780785 | 3.4236689 |
Myom1 | 17 | 71002633 | 71126856 | + | Myom1 | ENSMUSG00000024049 | myomesin 1 [Source:MGI Symbol;Acc:MGI:1341430] | + | gene | Name match | 0 | 5.3354306 | 3.0266078 |
Cox6a2 | 7 | 128205435 | 128206387 | - | Cox6a2 | ENSMUSG00000030785 | cytochrome c oxidase subunit 6A2 [Source:MGI Symbol;Acc:MGI:104649] | - | gene | Name match | 0 | 5.310141 | 2.7472584 |
Apobec2 | 17 | 48419231 | 48432930 | - | Apobec2 | ENSMUSG00000040694 | apolipoprotein B mRNA editing enzyme, catalytic polypeptide 2 [Source:MGI Symbol;Acc:MGI:1343178] | - | gene | Name match | 0 | 5.287872 | 2.7192059 |
Pdlim3 | 8 | 45885461 | 45919548 | + | Pdlim3 | ENSMUSG00000031636 | PDZ and LIM domain 3 [Source:MGI Symbol;Acc:MGI:1859274] | + | gene | Name match | 0 | 5.273842 | 3.42713 |
Smpx | X | 157698910 | 157752591 | + | Smpx | ENSMUSG00000041476 | small muscle protein, X-linked [Source:MGI Symbol;Acc:MGI:1913356] | + | gene | Name match | 0 | 5.2584686 | 2.851238 |
Myom2 | 8 | 15057653 | 15133541 | + | Myom2 | ENSMUSG00000031461 | myomesin 2 [Source:MGI Symbol;Acc:MGI:1328358] | + | gene | Name match | 0 | 4.9835496 | 2.4547803 |
Eef1a2 | 2 | 181147653 | 181157014 | - | Eef1a2 | ENSMUSG00000016349 | eukaryotic translation elongation factor 1 alpha 2 [Source:MGI Symbol;Acc:MGI:1096317] | - | gene | Name match | 0 | 4.9614844 | 2.9759362 |
Iigp1 | 18 | 60376029 | 60392627 | + | Iigp1 | ENSMUSG00000054072 | interferon inducible GTPase 1 [Source:MGI Symbol;Acc:MGI:1926259] | + | gene | Name match | 0 | 4.952799 | 2.288293 |
Klhl41 | 2 | 69670120 | 69684230 | + | Klhl41 | ENSMUSG00000075307 | kelch-like 41 [Source:MGI Symbol;Acc:MGI:2683854] | + | gene | Name match | 0 | 4.9376125 | 2.5512183 |
Myot | 18 | 44334074 | 44355724 | + | Myot | ENSMUSG00000024471 | myotilin [Source:MGI Symbol;Acc:MGI:1889800] | + | gene | Name match | 0 | 4.866352 | 2.3614779 |
Hrc | 7 | 45335290 | 45338974 | + | Hrc | ENSMUSG00000038239 | histidine rich calcium binding protein [Source:MGI Symbol;Acc:MGI:96226] | + | gene | Name match | 0 | 4.8531127 | 2.388653 |
Ank1 | 8 | 22974844 | 23150497 | + | Ank1 | ENSMUSG00000031543 | ankyrin 1, erythroid [Source:MGI Symbol;Acc:MGI:88024] | + | gene | Name match | 0 | 4.8427024 | 2.5067291 |
Tcap | 11 | 98383811 | 98384953 | + | Tcap | ENSMUSG00000007877 | titin-cap [Source:MGI Symbol;Acc:MGI:1330233] | + | gene | Name match | 0 | 4.8094006 | 2.4194448 |
Ryr1 | 7 | 29003344 | 29125179 | - | Ryr1 | ENSMUSG00000030592 | ryanodine receptor 1, skeletal muscle [Source:MGI Symbol;Acc:MGI:99659] | - | gene | Name match | 0 | 4.8011594 | 2.4239838 |
Hfe2 | 3 | 96525172 | 96529210 | + | Hfe2 | ENSMUSG00000038403 | hemochromatosis type 2 (juvenile) [Source:MGI Symbol;Acc:MGI:1916835] | + | gene | Name match | 0 | 4.730496 | 2.3208005 |
Ampd1 | 3 | 103074014 | 103099720 | + | Ampd1 | ENSMUSG00000070385 | adenosine monophosphate deaminase 1 [Source:MGI Symbol;Acc:MGI:88015] | + | gene | Name match | 0 | 4.663221 | 2.0056393 |
Myoz2 | 3 | 123006206 | 123035015 | - | Myoz2 | ENSMUSG00000028116 | myozenin 2 [Source:MGI Symbol;Acc:MGI:1913063] | - | gene | Name match | 0 | 4.661734 | 2.3122559 |
Cacng1 | 11 | 107703218 | 107716522 | - | Cacng1 | ENSMUSG00000020722 | calcium channel, voltage-dependent, gamma subunit 1 [Source:MGI Symbol;Acc:MGI:1206582] | - | gene | Name match | 0 | 4.654035 | 2.1101506 |
Sh3bgr | 16 | 96200450 | 96228935 | + | Sh3bgr | ENSMUSG00000040666 | SH3-binding domain glutamic acid-rich protein [Source:MGI Symbol;Acc:MGI:1354740] | + | gene | Name match | 0 | 4.595351 | 1.9935116 |
Cox8b | 7 | 140898945 | 140900446 | - | Cox8b | ENSMUSG00000025488 | cytochrome c oxidase subunit 8B [Source:MGI Symbol;Acc:MGI:105958] | - | gene | Name match | 0 | 4.5683045 | 2.1308832 |
Fabp3 | 4 | 130308595 | 130315463 | + | Fabp3 | ENSMUSG00000028773 | fatty acid binding protein 3, muscle and heart [Source:MGI Symbol;Acc:MGI:95476] | + | gene | Name match | 0 | 4.5498366 | 2.3100097 |
Arhgap36 | X | 49463945 | 49500244 | + | Arhgap36 | ENSMUSG00000036198 | Rho GTPase activating protein 36 [Source:MGI Symbol;Acc:MGI:1922654] | + | gene | Name match | 0 | 4.499045 | 2.1075962 |
Cilp | 9 | 65265180 | 65280605 | + | Cilp | ENSMUSG00000042254 | cartilage intermediate layer protein, nucleotide pyrophosphohydrolase [Source:MGI Symbol;Acc:MGI:2444507] | + | gene | Name match | 0 | 4.4644227 | 2.4463303 |
Myh3 | 11 | 67078300 | 67102291 | + | Myh3 | ENSMUSG00000020908 | myosin, heavy polypeptide 3, skeletal muscle, embryonic [Source:MGI Symbol;Acc:MGI:1339709] | + | gene | Name match | 0 | 4.4403567 | 1.8177018 |
Nrap | 19 | 56320035 | 56390037 | - | Nrap | ENSMUSG00000049134 | nebulin-related anchoring protein [Source:MGI Symbol;Acc:MGI:1098765] | - | gene | Name match | 0 | 4.3482184 | 2.001753 |
Synpo2l | 14 | 20658946 | 20668354 | - | Synpo2l | ENSMUSG00000039376 | synaptopodin 2-like [Source:MGI Symbol;Acc:MGI:1916010] | - | gene | Name match | 0 | 4.3379483 | 2.1240387 |
Cacna1s | 1 | 136052750 | 136119822 | + | Cacna1s | ENSMUSG00000026407 | calcium channel, voltage-dependent, L type, alpha 1S subunit [Source:MGI Symbol;Acc:MGI:88294] | + | gene | Name match | 0 | 4.2639847 | 2.1601298 |
Cav3 | 6 | 112459505 | 112472872 | + | Cav3 | ENSMUSG00000062694 | caveolin 3 [Source:MGI Symbol;Acc:MGI:107570] | + | gene | Name match | 0 | 4.233438 | 1.5682999 |
Fitm1 | 14 | 55575617 | 55576954 | + | Fitm1 | ENSMUSG00000022215 | fat storage-inducing transmembrane protein 1 [Source:MGI Symbol;Acc:MGI:1915930] | + | gene | Name match | 0 | 4.1784434 | 1.5370213 |
Atcayos | 10 | 81194609 | 81210877 | + | Atcayos | ENSMUSG00000085779 | ataxia, cerebellar, Cayman type, opposite strand [Source:MGI Symbol;Acc:MGI:1916928] | + | gene | Name match | 0 | 4.1450295 | 1.5326638 |
Klhl31 | 9 | 77636500 | 77660127 | + | Klhl31 | ENSMUSG00000044938 | kelch-like 31 [Source:MGI Symbol;Acc:MGI:3045305] | + | gene | Name match | 0 | 4.1251917 | 1.7273712 |
Tmem182 | 1 | 40805601 | 40856887 | + | Tmem182 | ENSMUSG00000079588 | transmembrane protein 182 [Source:MGI Symbol;Acc:MGI:1923725] | + | gene | Name match | 0 | 4.1091084 | 1.6307322 |
Obscn | 11 | 58994256 | 59136402 | - | Obscn | ENSMUSG00000061462 | obscurin, cytoskeletal calmodulin and titin- interacting RhoGEF [Source:MGI Symbol;Acc:MGI:2681862] | - | gene | Name match | 0 | 4.100353 | 1.4627026 |
Jsrp1 | 10 | 80808496 | 80813498 | - | Jsrp1 | ENSMUSG00000020216 | junctional sarcoplasmic reticulum protein 1 [Source:MGI Symbol;Acc:MGI:1916700] | - | gene | Name match | 0 | 4.0772023 | 2.2387457 |
Cavin4 | 4 | 48663514 | 48673502 | + | Cavin4 | ENSMUSG00000028348 | caveolae associated 4 [Source:MGI Symbol;Acc:MGI:1915266] | + | gene | Name match | 0 | 4.0137854 | 1.560191 |
Hspb7 | 4 | 141420779 | 141425311 | + | Hspb7 | ENSMUSG00000006221 | heat shock protein family, member 7 (cardiovascular) [Source:MGI Symbol;Acc:MGI:1352494] | + | gene | Name match | 0 | 4.0113735 | 1.872797 |
Piwil4 | 9 | 14696230 | 14740733 | - | Piwil4 | ENSMUSG00000036912 | piwi-like RNA-mediated gene silencing 4 [Source:MGI Symbol;Acc:MGI:3041167] | - | gene | Name match | 0 | 3.9173393 | 1.2546765 |
Gbp4 | 5 | 105115767 | 105139586 | - | Gbp4 | ENSMUSG00000079363 | guanylate binding protein 4 [Source:MGI Symbol;Acc:MGI:97072] | - | gene | Name match | 0 | 3.9009335 | 1.6620811 |
Mypn | 10 | 63115795 | 63203952 | - | Mypn | ENSMUSG00000020067 | myopalladin [Source:MGI Symbol;Acc:MGI:1916052] | - | gene | Name match | 0 | 3.833132 | 1.1766785 |
Dhrs7c | 11 | 67798187 | 67816002 | + | Dhrs7c | ENSMUSG00000033044 | dehydrogenase/reductase (SDR family) member 7C [Source:MGI Symbol;Acc:MGI:1915710] | + | gene | Name match | 0 | 3.8212578 | 1.3127257 |
Tmod4 | 3 | 95124476 | 95129209 | + | Tmod4 | ENSMUSG00000005628 | tropomodulin 4 [Source:MGI Symbol;Acc:MGI:1355285] | + | gene | Name match | 0 | 3.7632558 | 1.3997828 |
Camk2a | 18 | 60925618 | 60988152 | + | Camk2a | ENSMUSG00000024617 | calcium/calmodulin-dependent protein kinase II alpha [Source:MGI Symbol;Acc:MGI:88256] | + | gene | Name match | 0 | 3.7615945 | 1.9082276 |
Fbp2 | 13 | 62836877 | 62858422 | - | Fbp2 | ENSMUSG00000021456 | fructose bisphosphatase 2 [Source:MGI Symbol;Acc:MGI:95491] | - | gene | Name match | 0 | 3.7539046 | 1.1742979 |
Art1 | 7 | 102101743 | 102113933 | + | Art1 | ENSMUSG00000030996 | ADP-ribosyltransferase 1 [Source:MGI Symbol;Acc:MGI:107511] | + | gene | Name match | 0 | 3.6770535 | 0.9062833 |
Hmcn2 | 2 | 31314415 | 31460738 | + | Hmcn2 | ENSMUSG00000055632 | hemicentin 2 [Source:MGI Symbol;Acc:MGI:2677838] | + | gene | Name match | 0 | 3.668998 | 1.7803224 |
Gm37759 | 1 | 136116577 | 136119822 | - | Gm37759 | ENSMUSG00000102717 | predicted gene, 37759 [Source:MGI Symbol;Acc:MGI:5610987] | - | gene | Name match | 0 | 3.658001 | 1.6372976 |
Sgcg | 14 | 61219115 | 61258490 | - | Sgcg | ENSMUSG00000035296 | sarcoglycan, gamma (dystrophin-associated glycoprotein) [Source:MGI Symbol;Acc:MGI:1346524] | - | gene | Name match | 0 | 3.6402366 | 1.0326926 |
Ckmt2 | 13 | 91853387 | 91876885 | - | Ckmt2 | ENSMUSG00000021622 | creatine kinase, mitochondrial 2 [Source:MGI Symbol;Acc:MGI:1923972] | - | gene | Name match | 0 | 3.630171 | 0.8835902 |
Smyd1 | 6 | 71213940 | 71322233 | - | Smyd1 | ENSMUSG00000055027 | SET and MYND domain containing 1 [Source:MGI Symbol;Acc:MGI:104790] | - | gene | Name match | 0 | 3.5943708 | 1.3996502 |
Tnnt1 | 7 | 4504570 | 4516382 | - | Tnnt1 | ENSMUSG00000064179 | troponin T1, skeletal, slow [Source:MGI Symbol;Acc:MGI:1333868] | - | gene | Name match | 0 | 3.5926695 | 1.4827223 |
Lmod3 | 6 | 97238534 | 97252759 | - | Lmod3 | ENSMUSG00000044086 | leiomodin 3 (fetal) [Source:MGI Symbol;Acc:MGI:2444169] | - | gene | Name match | 0 | 3.527568 | 0.9856329 |
1110002E22Rik | 3 | 138065052 | 138081506 | + | 1110002E22Rik | ENSMUSG00000090066 | RIKEN cDNA 1110002E22 gene [Source:MGI Symbol;Acc:MGI:1915066] | + | gene | Name match | 0 | 3.5229347 | 1.227827 |
Rtl1 | 12 | 109589193 | 109600330 | - | Rtl1 | ENSMUSG00000085925 | retrotransposon Gaglike 1 [Source:MGI Symbol;Acc:MGI:2656842] | - | gene | Name match | 0 | 3.5128603 | 1.6970845 |
Rpl3l | 17 | 24727820 | 24736143 | + | Rpl3l | ENSMUSG00000002500 | ribosomal protein L3-like [Source:MGI Symbol;Acc:MGI:1913461] | + | gene | Name match | 0 | 3.5003052 | 0.68470806 |
Gbp2b | 3 | 142594847 | 142619179 | + | Gbp2b | ENSMUSG00000040264 | guanylate binding protein 2b [Source:MGI Symbol;Acc:MGI:95666] | + | gene | Name match | 0 | 3.4459944 | 1.5280095 |
Ppp1r3a | 6 | 14713977 | 14755274 | - | Ppp1r3a | ENSMUSG00000042717 | protein phosphatase 1, regulatory subunit 3A [Source:MGI Symbol;Acc:MGI:2153588] | - | gene | Name match | 0 | 3.432337 | 1.2207707 |
Klk8 | 7 | 43797577 | 43803826 | + | Klk8 | ENSMUSG00000064023 | kallikrein related-peptidase 8 [Source:MGI Symbol;Acc:MGI:1343327] | + | gene | Name match | 0 | 3.3492088 | 1.4467646 |
Tnnt2 | 1 | 135836354 | 135852260 | + | Tnnt2 | ENSMUSG00000026414 | troponin T2, cardiac [Source:MGI Symbol;Acc:MGI:104597] | + | gene | Name match | 0 | 3.305336 | 0.9699723 |
Xirp1 | 9 | 120013755 | 120023598 | - | Xirp1 | ENSMUSG00000079243 | xin actin-binding repeat containing 1 [Source:MGI Symbol;Acc:MGI:1333878] | - | gene | Name match | 0 | 3.2169416 | 0.81996137 |
Chrna1 | 2 | 73563215 | 73580338 | - | Chrna1 | ENSMUSG00000027107 | cholinergic receptor, nicotinic, alpha polypeptide 1 (muscle) [Source:MGI Symbol;Acc:MGI:87885] | - | gene | Name match | 0 | 3.1982536 | 0.5401571 |
Asb2 | 12 | 103321142 | 103356001 | - | Asb2 | ENSMUSG00000021200 | ankyrin repeat and SOCS box-containing 2 [Source:MGI Symbol;Acc:MGI:1929743] | - | gene | Name match | 0 | 3.1897268 | 1.2237269 |
Pck1 | 2 | 173153048 | 173159273 | + | Pck1 | ENSMUSG00000027513 | phosphoenolpyruvate carboxykinase 1, cytosolic [Source:MGI Symbol;Acc:MGI:97501] | + | gene | Name match | 0 | 3.1702936 | 1.100215 |
Hhatl | 9 | 121784016 | 121792507 | - | Hhatl | ENSMUSG00000032523 | hedgehog acyltransferase-like [Source:MGI Symbol;Acc:MGI:1922020] | - | gene | Name match | 0 | 3.1309252 | 1.2117059 |
Klhl40 | 9 | 121777607 | 121783818 | + | Klhl40 | ENSMUSG00000074001 | kelch-like 40 [Source:MGI Symbol;Acc:MGI:1919580] | + | gene | Name match | 0 | 3.1116276 | 0.9328232 |
Myo18b | 5 | 112688876 | 112896362 | - | Myo18b | ENSMUSG00000072720 | myosin XVIIIb [Source:MGI Symbol;Acc:MGI:1921626] | - | gene | Name match | 0 | 3.084533 | 0.8559819 |
Igsf1 | X | 49782536 | 49797749 | - | Igsf1 | ENSMUSG00000031111 | immunoglobulin superfamily, member 1 [Source:MGI Symbol;Acc:MGI:2147913] | - | gene | Name match | 0 | 3.0787194 | 1.1340787 |
Ephb1 | 9 | 101922128 | 102354693 | - | Ephb1 | ENSMUSG00000032537 | Eph receptor B1 [Source:MGI Symbol;Acc:MGI:1096337] | - | gene | Name match | 0 | 3.0727623 | 1.2279277 |
Mylk2 | 2 | 152911352 | 152923068 | + | Mylk2 | ENSMUSG00000027470 | myosin, light polypeptide kinase 2, skeletal muscle [Source:MGI Symbol;Acc:MGI:2139434] | + | gene | Name match | 0 | 3.0480812 | 1.0752155 |
Rbm24 | 13 | 46418434 | 46431095 | + | Rbm24 | ENSMUSG00000038132 | RNA binding motif protein 24 [Source:MGI Symbol;Acc:MGI:3610364] | + | gene | Name match | 0 | 3.0163505 | 0.8675073 |
Trim72 | 7 | 128003949 | 128011033 | + | Trim72 | ENSMUSG00000042828 | tripartite motif-containing 72 [Source:MGI Symbol;Acc:MGI:3612190] | + | gene | Name match | 0 | 3.001933 | 0.45001903 |
This looks pretty telling. The genes we're looking at here all appear to be strongly related to muscles, so actin, myosin and troponin as well as mitochondrial genes which would be increased in muscle cells. I put the list through gprofiler and get enormously significant hits to muscle pathways (eg Reactome muscle contraction at p=3x10^-37.
Given that these samples are supposed to be Schwann cells, it doesn't make sense for them to be expressing these kinds of muscle markers at this level. It would seem to be much more plausible that what has actually happened is that the samples have been contaminated to varying degrees with muscle cells from the surrounding tissue, and that the amount of contamination varies between the two groups.
If this supposition is correct then it's also likely that the amount of contamination will vary within each group since it will not relate to the genotype but will be a function of the dissection or sample preparation. We can try to assess this by looking at whether the muscle related genes are unusually variable within the WT or cKO groups. We can do this with a variation plot, plotting STDEV against expression level for each of the two groups.
It is clear that in both the WT and cKO groups that these muscle markers are unusually variable, suggesting that an external contamination is a viable explanation. We can also clearly see the difference in expression level between the two groups which led us to look at these genes in the first place.
An examination of these samples would ideally have shown us that they contained good quality data and exhibited constant effects of the type we expect given the experimental description. Unfortunately an examination of the data instead raises serious concerns about whether these samples are indeed valid and we should probably not proceed with this analysis until the issues raised are sorted out.
Specifically there are 3 major concerns:
1. The nature of the read data looks wrong, in that this was supposed to be an Illumina Tru-Seq library, but the reads coming from it are non-directional.
2. The cKO samples are supposed to have a knockout of the Taz and Yap1 genes and the data shows no evidence of this change. Since it's a conditional knockout then maybe there is more subtelty to the design than a simple reading of the conditions suggests, but this needs to be determined since a lack of the expected effect would mean that any further work on the data would be pointless.
3. It would appear that all of the samples may be contaminated with muscle cells, and unfortuanately that the amount of contamination is not consistent between the two groups, but is systematically higher in cKO than WT. Given the conflation of the contamination and the condition it is going to be very difficult to specifically attribute any changes seen to the knockouts rather than the contamination.