Differential gene expression

In this tutorial we will cover about Differetial gene expression, which comprises an extensive range of topics and methods. In single cell, differential expresison can have multiple functionalities such as of identifying marker genes for cell populations, as well as differentially regulated genes across conditions (healthy vs control). We will also exercise on how to account the batch information in your test.

We can first load the data from the clustering session. Moreover, we can already decide which clustering resolution to use. First let’s define using the louvain clustering to identifying differentially expressed genes.

suppressPackageStartupMessages({
    library(scater)
    library(scran)
    # library(venn)
    library(cowplot)
    library(ggplot2)
    # library(rafalib)
    library(pheatmap)
    library(igraph)
    library(dplyr)
})

sce <- readRDS("data/results/covid_qc_dr_int_cl.rds")

Cell marker genes


Let us first compute a ranking for the highly differential genes in each cluster. There are many different tests and parameters to be chosen that can be used to refine your results. When looking for marker genes, we want genes that are positivelly expressed in a cell type and possibly not expressed in the others.

# Compute differentiall expression
markers_genes <- scran::findMarkers(x = sce, groups = as.character(sce$louvain_SNNk15), 
    lfc = 0.5, pval.type = "all", direction = "up")

# List of dataFrames with the results for each cluster
markers_genes
## List of length 9
## names(9): 1 2 3 4 5 6 7 8 9
# Visualizing the expression of one
markers_genes[["1"]]
## DataFrame with 18121 rows and 10 columns
##                         p.value                  FDR              logFC.2
##                       <numeric>            <numeric>            <numeric>
## PF4        5.42236942598342e-16 9.82587563682454e-12     3.01259842765971
## PPBP       5.00056541348096e-10 4.53076229288442e-06     2.74257205744216
## GNG11      5.06161511405363e-09 3.05738424939219e-05     1.73041831456037
## OST4       3.07547152464266e-08 0.000139326548745124     1.87877915781507
## NRGN       1.85074354590057e-07 0.000670746475905283     2.08202120232519
## ...                         ...                  ...                  ...
## AL592183.1                    1                    1  -0.0187820319786617
## AC007325.4                    1                    1  -0.0103004003928936
## AL354822.1                    1                    1 -0.00115246909162764
## AC004556.1                    1                    1  -0.0203955520005165
## AC233755.1                    1                    1                    0
##                         logFC.3              logFC.4              logFC.5
##                       <numeric>            <numeric>            <numeric>
## PF4            3.49645473674068     3.71650582803699     3.97082330250393
## PPBP           3.25285030340934      3.5358030592694      4.0585250617998
## GNG11          1.83623668138815     1.88469241020645     2.05995995533478
## OST4           1.88123537370156     2.10997397036826     2.22804490374774
## NRGN           2.43702690676073     2.57392457310674     3.43286114568247
## ...                         ...                  ...                  ...
## AL592183.1  -0.0834303779606505   -0.132234591840394  -0.0992384326208183
## AC007325.4 -0.00659670894011874   -0.020049794508921 -0.00325537366633247
## AL354822.1 -0.00207784348641178 -0.00307332511360672  -0.0148900779361155
## AC004556.1  -0.0863727572375302   -0.151851034301112  -0.0649677107625443
## AC233755.1                    0                    0                    0
##                          logFC.6              logFC.7              logFC.8
##                        <numeric>            <numeric>            <numeric>
## PF4             3.75882391271893     3.92452797172521     3.93989492544756
## PPBP            3.42402456902586     3.94767644870726      3.9178936327251
## GNG11           1.89687239074213     2.05368301018222     2.05703647611279
## OST4            2.14540449652277      1.9307661625303     2.25580705714899
## NRGN             2.7537850162175     3.34180837857761     3.37772776639445
## ...                          ...                  ...                  ...
## AL592183.1   -0.0839305968400599   -0.161401303237351   -0.199465267512809
## AC007325.4 -0.000770947272540883 -0.00493851468672034 -0.00136015783179628
## AL354822.1                     0 -0.00489866295805064  -0.0158253670990768
## AC004556.1    -0.100088330539916  -0.0714877934768116  -0.0425330546944955
## AC233755.1                     0                    0 -0.00406988649560987
##                         logFC.9
##                       <numeric>
## PF4            3.93296778606565
## PPBP           4.00643150853551
## GNG11          2.07571196413914
## OST4           1.85064380320819
## NRGN           3.39901107279735
## ...                         ...
## AL592183.1   -0.136006712007594
## AC007325.4 -0.00474466901986934
## AL354822.1 -0.00520532662403466
## AC004556.1  -0.0876779685867214
## AC233755.1                    0

We can now select the top 25 up regulated genes for plotting.

# Colect the top 25 genes for each cluster and put the into a single table
top25 <- lapply(names(markers_genes), function(x) {
    temp <- markers_genes[[x]][1:25, 1:2]
    temp$gene <- rownames(markers_genes[[x]])[1:25]
    temp$cluster <- x
    return(temp)
})
top25 <- as_tibble(do.call(rbind, top25))
top25$p.value[top25$p.value == 0] <- 9.99999999999999e-301
top25