Celltype prediction

Bioconductor Toolkit

Assignment of cell identities based on gene expression patterns using reference data.
Authors

Åsa Björklund

Paulo Czarnewski

Susanne Reinsbach

Roy Francis

Published

04-Oct-2024

Note

Code chunks run R commands unless otherwise specified.

Celltype prediction can either be performed on indiviudal cells where each cell gets a predicted celltype label, or on the level of clusters. All methods are based on similarity to other datasets, single cell or sorted bulk RNAseq, or uses known marker genes for each cell type.
Ideally celltype predictions should be run on each sample separately and not using the integrated data. In this case we will select one sample from the Covid data, ctrl_13 and predict celltype by cell on that sample.
Some methods will predict a celltype to each cell based on what it is most similar to, even if that celltype is not included in the reference. Other methods include an uncertainty so that cells with low similarity scores will be unclassified.
There are multiple different methods to predict celltypes, here we will just cover a few of those.

We will use a reference PBMC dataset from the scPred package which is provided as a Seurat object with counts. And we will test classification based on the scPred and scMap methods. Finally we will use gene set enrichment predict celltype based on the DEGs of each cluster.

1 Read data

First, lets load required libraries

suppressPackageStartupMessages({
    library(scater)
    library(scran)
    library(dplyr)
    library(patchwork)
    library(ggplot2)
    library(pheatmap)
    library(scPred)
    library(scmap)
    library(SingleR)
})

Let’s read in the saved Covid-19 data object from the clustering step.

# download pre-computed data if missing or long compute
fetch_data <- TRUE

# url for source and intermediate data
path_data <- "https://export.uppmax.uu.se/naiss2023-23-3/workshops/workshop-scrnaseq"
path_file <- "data/covid/results/bioc_covid_qc_dr_int_cl.rds"
if (!dir.exists(dirname(path_file))) dir.create(dirname(path_file), recursive = TRUE)
if (fetch_data && !file.exists(path_file)) download.file(url = file.path(path_data, "covid/results/bioc_covid_qc_dr_int_cl.rds"), destfile = path_file)
alldata <- readRDS(path_file)

Let’s read in the saved Covid-19 data object from the clustering step.

ctrl.sce <- alldata[, alldata$sample == "ctrl.13"]

# remove all old dimensionality reductions as they will mess up the analysis further down
reducedDims(ctrl.sce) <- NULL

2 Reference data

Load the reference dataset with annotated labels that is provided by the scPred package, it is a subsampled set of cells from human PBMCs.

reference <- scPred::pbmc_1
reference
An object of class Seurat 
32838 features across 3500 samples within 1 assay 
Active assay: RNA (32838 features, 0 variable features)
 2 layers present: counts, data

Convert to a SCE object.

ref.sce <- Seurat::as.SingleCellExperiment(reference)

Rerun analysis pipeline. Run normalization, feature selection and dimensionality reduction

# Normalize
ref.sce <- computeSumFactors(ref.sce)
ref.sce <- logNormCounts(ref.sce)

# Variable genes
var.out <- modelGeneVar(ref.sce, method = "loess")
hvg.ref <- getTopHVGs(var.out, n = 1000)

# Dim reduction
ref.sce <- runPCA(ref.sce,
    exprs_values = "logcounts", scale = T,
    ncomponents = 30, subset_row = hvg.ref
)
ref.sce <- runUMAP(ref.sce, dimred = "PCA")
plotReducedDim(ref.sce, dimred = "UMAP", colour_by = "cell_type")

Run all steps of the analysis for the ctrl sample as well. Use the clustering from the integration lab with resolution 0.5.

# Normalize
ctrl.sce <- computeSumFactors(ctrl.sce)
ctrl.sce <- logNormCounts(ctrl.sce)

# Variable genes
var.out <- modelGeneVar(ctrl.sce, method = "loess")
hvg.ctrl <- getTopHVGs(var.out, n = 1000)

# Dim reduction
ctrl.sce <- runPCA(ctrl.sce, exprs_values = "logcounts", scale = T, ncomponents = 30, subset_row = hvg.ctrl)
ctrl.sce <- runUMAP(ctrl.sce, dimred = "PCA")
plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "leiden_k20")

3 scMap

The scMap package is one method for projecting cells from a scRNA-seq experiment on to the cell-types or individual cells identified in a different experiment. It can be run on different levels, either projecting by cluster or by single cell, here we will try out both.

For scmap cell type labels must be stored in the cell_type1 column of the colData slots, and gene ids that are consistent across both datasets must be stored in the feature_symbol column of the rowData slots.

3.1 scMap cluster

# add in slot cell_type1
ref.sce$cell_type1 <- ref.sce$cell_type
# create a rowData slot with feature_symbol
rd <- data.frame(feature_symbol = rownames(ref.sce))
rownames(rd) <- rownames(ref.sce)
rowData(ref.sce) <- rd

# same for the ctrl dataset
# create a rowData slot with feature_symbol
rd <- data.frame(feature_symbol = rownames(ctrl.sce))
rownames(rd) <- rownames(ctrl.sce)
rowData(ctrl.sce) <- rd

Then we can select variable features in both datasets.

# select features
counts(ctrl.sce) <- as.matrix(counts(ctrl.sce))
logcounts(ctrl.sce) <- as.matrix(logcounts(ctrl.sce))
ctrl.sce <- selectFeatures(ctrl.sce, suppress_plot = TRUE)

counts(ref.sce) <- as.matrix(counts(ref.sce))
logcounts(ref.sce) <- as.matrix(logcounts(ref.sce))
ref.sce <- selectFeatures(ref.sce, suppress_plot = TRUE)

Then we need to index the reference dataset by cluster, default is the clusters in cell_type1.

ref.sce <- indexCluster(ref.sce)

Now we project the Covid-19 dataset onto that index.

project_cluster <- scmapCluster(
    projection = ctrl.sce,
    index_list = list(
        ref = metadata(ref.sce)$scmap_cluster_index
    )
)

# projected labels
table(project_cluster$scmap_cluster_labs)

     B cell  CD4 T cell  CD8 T cell         cDC       cMono      ncMono 
         72         109         112          29         201         142 
    NK cell         pDC Plasma cell  unassigned 
        280           2           1         153 

Then add the predictions to metadata and plot UMAP.

# add in predictions
ctrl.sce$scmap_cluster <- project_cluster$scmap_cluster_labs

plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "scmap_cluster")

4 scMap cell

We can instead index the refernce data based on each single cell and project our data onto the closest neighbor in that dataset.

ref.sce <- indexCell(ref.sce)

Again we need to index the reference dataset.

project_cell <- scmapCell(
    projection = ctrl.sce,
    index_list = list(
        ref = metadata(ref.sce)$scmap_cell_index
    )
)

We now get a table with index for the 5 nearest neigbors in the reference dataset for each cell in our dataset. We will select the celltype of the closest neighbor and assign it to the data.

cell_type_pred <- colData(ref.sce)$cell_type1[project_cell$ref[[1]][1, ]]
table(cell_type_pred)
cell_type_pred
     B cell  CD4 T cell  CD8 T cell         cDC       cMono      ncMono 
         96         162         264          30         220         148 
    NK cell Plasma cell 
        180           1 

Then add the predictions to metadata and plot umap.

# add in predictions
ctrl.sce$scmap_cell <- cell_type_pred

plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "scmap_cell")

Plot both:

wrap_plots(
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "scmap_cluster"),
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "scmap_cell"),
    ncol = 2
)

5 SinlgeR

SingleR is performs unbiased cell type recognition from single-cell RNA sequencing data, by leveraging reference transcriptomic datasets of pure cell types to infer the cell of origin of each single cell independently.

There are multiple datasets included in the celldex package that can be used for celltype prediction, here we will test two different ones, the DatabaseImmuneCellExpressionData and the HumanPrimaryCellAtlasData. In addition we will use the same reference dataset that we used for label transfer above but using SingleR instead.

5.1 Immune cell reference

immune = celldex::DatabaseImmuneCellExpressionData()
singler.immune <- SingleR(test = ctrl.sce, ref = immune, assay.type.test=1,
    labels = immune$label.main)

head(singler.immune)
DataFrame with 6 rows and 4 columns
                                               scores        labels delta.next
                                             <matrix>   <character>  <numeric>
AGGTCATGTGCGAACA-13  0.0679680:0.0760411:0.186694:... T cells, CD4+  0.1375513
CCTATCGGTCCCTCAT-13  0.0027079:0.0960641:0.386088:...      NK cells  0.1490740
TCCTCCCTCGTTCATT-13  0.0361115:0.1067465:0.394579:...      NK cells  0.1220681
CAACCAATCATCTATC-13  0.0342030:0.1345967:0.402377:...      NK cells  0.1513308
TACGGTATCGGATTAC-13 -0.0131813:0.0717678:0.283882:...      NK cells  0.0620657
AATAGAGAGTTCGGTT-13  0.0841091:0.1367749:0.273738:... T cells, CD4+  0.0660296
                    pruned.labels
                      <character>
AGGTCATGTGCGAACA-13 T cells, CD4+
CCTATCGGTCCCTCAT-13      NK cells
TCCTCCCTCGTTCATT-13      NK cells
CAACCAATCATCTATC-13      NK cells
TACGGTATCGGATTAC-13      NK cells
AATAGAGAGTTCGGTT-13 T cells, CD4+

5.2 HPCA reference

hpca <- HumanPrimaryCellAtlasData()
singler.hpca <- SingleR(test = ctrl.sce, ref = hpca, assay.type.test=1,
    labels = hpca$label.main)

head(singler.hpca)
DataFrame with 6 rows and 4 columns
                                            scores      labels delta.next
                                          <matrix> <character>  <numeric>
AGGTCATGTGCGAACA-13 0.141378:0.310009:0.275987:...     T_cells  0.4918992
CCTATCGGTCCCTCAT-13 0.145926:0.300045:0.277827:...     NK_cell  0.3241970
TCCTCCCTCGTTCATT-13 0.132119:0.311754:0.274127:...     NK_cell  0.0640608
CAACCAATCATCTATC-13 0.157184:0.302219:0.284496:...     NK_cell  0.2012408
TACGGTATCGGATTAC-13 0.125120:0.283118:0.250322:...     T_cells  0.1545913
AATAGAGAGTTCGGTT-13 0.191441:0.374422:0.329988:...     T_cells  0.5063484
                    pruned.labels
                      <character>
AGGTCATGTGCGAACA-13       T_cells
CCTATCGGTCCCTCAT-13       NK_cell
TCCTCCCTCGTTCATT-13       NK_cell
CAACCAATCATCTATC-13       NK_cell
TACGGTATCGGATTAC-13       T_cells
AATAGAGAGTTCGGTT-13       T_cells

5.3 With own reference data

singler.ref <- SingleR(test=ctrl.sce, ref=ref.sce, labels=ref.sce$cell_type, de.method="wilcox")
head(singler.ref)
DataFrame with 6 rows and 4 columns
                                            scores      labels delta.next
                                          <matrix> <character>  <numeric>
AGGTCATGTGCGAACA-13 0.741719:0.840093:0.805977:...  CD4 T cell  0.0423204
CCTATCGGTCCCTCAT-13 0.649491:0.736753:0.815987:...     NK cell  0.0451715
TCCTCCCTCGTTCATT-13 0.669603:0.731356:0.823308:...     NK cell  0.0865526
CAACCAATCATCTATC-13 0.649948:0.721639:0.801202:...  CD8 T cell  0.0740195
TACGGTATCGGATTAC-13 0.708827:0.776244:0.808044:...  CD8 T cell  0.0905218
AATAGAGAGTTCGGTT-13 0.729010:0.847462:0.816299:...  CD4 T cell  0.0409309
                    pruned.labels
                      <character>
AGGTCATGTGCGAACA-13    CD4 T cell
CCTATCGGTCCCTCAT-13       NK cell
TCCTCCCTCGTTCATT-13       NK cell
CAACCAATCATCTATC-13    CD8 T cell
TACGGTATCGGATTAC-13    CD8 T cell
AATAGAGAGTTCGGTT-13    CD4 T cell

Compare results:

ctrl.sce$singler.immune = singler.immune$pruned.labels
ctrl.sce$singler.hpca = singler.hpca$pruned.labels
ctrl.sce$singler.ref = singler.ref$pruned.labels

wrap_plots(
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "singler.immune"),
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "singler.hpca"),
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "singler.ref"),
    ncol = 3
)

6 Compare results

Now we will compare the output of the two methods using the convenient function in scPred crossTab() that prints the overlap between two metadata slots.

table(ctrl.sce$scmap_cell, ctrl.sce$singler.hpca)
             
              B_cell Macrophage Monocyte NK_cell Platelets T_cells
  B cell          94          0        0       0         0       1
  CD4 T cell       0          0        0       3         0     158
  CD8 T cell       0          0        0     139         0     118
  cDC              2          0       27       0         0       0
  cMono            2          0      202       0         1       0
  ncMono           0          2      145       0         0       0
  NK cell          0          0        0     165         0       9
  Plasma cell      1          0        0       0         0       0

Or plot onto umap:

wrap_plots(
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "scmap_cluster"),
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "scmap_cell"),
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "singler.immune"),
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "singler.hpca"),
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "singler.ref"),
    ncol = 3
)

As you can see, the methods using the same reference all have similar results. While for instance singleR with different references give quite different predictions. This really shows that a relevant reference is the key in having reliable celltype predictions rather than the method used.

7 GSEA with celltype markers

Another option, where celltype can be classified on cluster level is to use gene set enrichment among the DEGs with known markers for different celltypes. Similar to how we did functional enrichment for the DEGs in the differential expression exercise. There are some resources for celltype gene sets that can be used. Such as CellMarker, PanglaoDB or celltype gene sets at MSigDB. We can also look at overlap between DEGs in a reference dataset and the dataset you are analyzing.

7.1 DEG overlap

First, lets extract top DEGs for our Covid-19 dataset and the reference dataset. When we run differential expression for our dataset, we want to report as many genes as possible, hence we set the cutoffs quite lenient.

# run differential expression in our dataset, using clustering at resolution 0.3
DGE_list <- scran::findMarkers(
    x = alldata,
    groups = as.character(alldata$leiden_k20),
    pval.type = "all",
    min.prop = 0
)
# Compute differential gene expression in reference dataset (that has cell annotation)
ref_DGE <- scran::findMarkers(
    x = ref.sce,
    groups = as.character(ref.sce$cell_type),
    pval.type = "all",
    direction = "up"
)

# Identify the top cell marker genes in reference dataset
# select top 50 with hihgest foldchange among top 100 signifcant genes.
ref_list <- lapply(ref_DGE, function(x) {
    x$logFC <- rowSums(as.matrix(x[, grep("logFC", colnames(x))]))
    x %>%
        as.data.frame() %>%
        filter(p.value < 0.01) %>%
        top_n(-100, p.value) %>%
        top_n(50, logFC) %>%
        rownames()
})

unlist(lapply(ref_list, length))
     B cell  CD4 T cell  CD8 T cell         cDC       cMono      ncMono 
         50          50          19          17          50          50 
    NK cell         pDC Plasma cell 
         50          50          24 

Now we can run GSEA for the DEGs from our dataset and check for enrichment of top DEGs in the reference dataset.

suppressPackageStartupMessages(library(fgsea))

# run fgsea for each of the clusters in the list
res <- lapply(DGE_list, function(x) {
    x$logFC <- rowSums(as.matrix(x[, grep("logFC", colnames(x))]))
    gene_rank <- setNames(x$logFC, rownames(x))
    fgseaRes <- fgsea(pathways = ref_list, stats = gene_rank, nperm = 10000)
    return(fgseaRes)
})
names(res) <- names(DGE_list)

# You can filter and resort the table based on ES, NES or pvalue
res <- lapply(res, function(x) {
    x[x$pval < 0.1, ]
})
res <- lapply(res, function(x) {
    x[x$size > 2, ]
})
res <- lapply(res, function(x) {
    x[order(x$NES, decreasing = T), ]
})
res
$`1`
       pathway         pval         padj         ES       NES nMoreExtreme
        <char>        <num>        <num>      <num>     <num>        <num>
1:       cMono 0.0010298661 0.0015822785  0.9475486  2.261276            0
2:      ncMono 0.0010548523 0.0015822785  0.7915538  1.899211            0
3:  CD8 T cell 0.0037413773 0.0048103422 -0.8640088 -1.451804           31
4: Plasma cell 0.0003449862 0.0007762190 -0.8824599 -1.530887            2
5:      B cell 0.0001107297 0.0003321891 -0.8360423 -1.550833            0
6:     NK cell 0.0001104484 0.0003321891 -0.8525261 -1.586638            0
7:  CD4 T cell 0.0001102657 0.0003321891 -0.9218630 -1.718905            0
    size  leadingEdge
   <int>       <list>
1:    47 S100A8, ....
2:    49 AIF1, S1....
3:    18 CCL5, IL....
4:    24 RPL36AL,....
5:    47 RPS5, RP....
6:    49 B2M, NKG....
7:    50 RPL3, RP....

$`10`
       pathway         pval         padj         ES       NES nMoreExtreme
        <char>        <num>        <num>      <num>     <num>        <num>
1: Plasma cell 0.0148148148 0.0222222222  0.6576329  1.721790            1
2:  CD8 T cell 0.0945711072 0.1215914236 -0.7452621 -1.212938          924
3:      B cell 0.0027056819 0.0048702275 -0.7622529 -1.295898           26
4:     NK cell 0.0002003807 0.0004508566 -0.7899193 -1.345112            1
5:       cMono 0.0001002104 0.0003006313 -0.8298881 -1.410884            0
6:      ncMono 0.0001001904 0.0003006313 -0.8641771 -1.471562            0
7:  CD4 T cell 0.0001001904 0.0003006313 -0.8690484 -1.480696            0
    size  leadingEdge
   <int>       <list>
1:    24 JCHAIN, ....
2:    18 IL32, CC....
3:    47 CD52, CX....
4:    49 B2M, HCS....
5:    47 JUND, S1....
6:    49 S100A4, ....
7:    50 RPL38, R....

$`11`
       pathway         pval         padj         ES       NES nMoreExtreme
        <char>        <num>        <num>      <num>     <num>        <num>
1:      ncMono 0.0001330849 0.0009098261  0.9785302  1.897494            0
2:         cDC 0.0067663257 0.0101494886  0.8955752  1.521429           42
3:       cMono 0.0907621869 0.1021074603  0.6882224  1.326830          673
4: Plasma cell 0.0144570901 0.0185876873 -0.7608749 -1.560885           46
5:  CD8 T cell 0.0005641749 0.0010155148 -0.9246833 -1.789615            1
6:      B cell 0.0003881988 0.0009098261 -0.7699594 -1.804747            0
7:     NK cell 0.0004019293 0.0009098261 -0.8180535 -1.922275            0
8:  CD4 T cell 0.0004043672 0.0009098261 -0.9361761 -2.210839            0
    size  leadingEdge
   <int>       <list>
1:    49 LST1, AI....
2:    17 HLA-DPA1....
3:    47 TYROBP, ....
4:    24 ISG20, P....
5:    18 CCL5, IL....
6:    47 CXCR4, R....
7:    49 NKG7, GN....
8:    50 RPL3, RP....

$`12`
       pathway         pval         padj         ES       NES nMoreExtreme
        <char>        <num>        <num>      <num>     <num>        <num>
1:  CD4 T cell 0.0001014919 0.0009134274  0.9667146  1.659533            0
2: Plasma cell 0.0040016849 0.0117647059  0.8463036  1.397589           37
3:      B cell 0.0113994911 0.0170992366  0.7606321  1.301894          111
4:  CD8 T cell 0.0595648427 0.0765833692  0.7948827  1.285249          552
5:      ncMono 0.0065359477 0.0117647059 -0.6928308 -1.925996            0
6:         cDC 0.0013089005 0.0058900524 -0.9317140 -2.097910            0
7:       cMono 0.0056497175 0.0117647059 -0.7907891 -2.192421            0
    size  leadingEdge
   <int>       <list>
1:    50 IL7R, RP....
2:    24 RPL36AL,....
3:    47 RPS5, RP....
4:    18 IL32, CD....
5:    49 FCER1G, ....
6:    17 HLA-DRA,....
7:    47 S100A9, ....

$`13`
       pathway         pval         padj         ES       NES nMoreExtreme
        <char>        <num>        <num>      <num>     <num>        <num>
1:      B cell 0.0001072386 0.0003531489  0.9492350  1.736001            0
2:  CD4 T cell 0.0001065303 0.0003531489  0.9017719  1.658073            0
3:         cDC 0.0001177163 0.0003531489  0.9396430  1.564647            0
4: Plasma cell 0.0234930094 0.0261662198  0.8008948  1.380153          204
5:         pDC 0.0261662198 0.0261662198  0.7359128  1.345869          243
6:      ncMono 0.0015748031 0.0020247469 -0.7153196 -1.854475            0
7:  CD8 T cell 0.0006891799 0.0015506547 -0.9344223 -2.013821            0
8:       cMono 0.0014771049 0.0020247469 -0.7806780 -2.016802            0
9:     NK cell 0.0015748031 0.0020247469 -0.8788032 -2.278309            0
    size  leadingEdge
   <int>       <list>
1:    47 MS4A1, C....
2:    50 RPS6, RP....
3:    17 HLA-DRA,....
4:    24 ISG20, P....
5:    47 TCF4, IR....
6:    49 S100A4, ....
7:    18 CCL5, IL....
8:    47 S100A9, ....
9:    49 HCST, NK....

$`2`
      pathway         pval         padj         ES       NES nMoreExtreme  size
       <char>        <num>        <num>      <num>     <num>        <num> <int>
1:     B cell 0.0008149959 0.0019417476  0.9744244  2.237231            0    47
2:        cDC 0.0010787487 0.0019417476  0.9451680  1.835723            1    17
3: CD4 T cell 0.0209030100 0.0268752986  0.6547482  1.519947           24    50
4: CD8 T cell 0.0046557216 0.0069835825 -0.8803482 -1.454118           37    18
5:      cMono 0.0009116809 0.0019417476 -0.8246370 -1.511900            7    47
6:     ncMono 0.0001134945 0.0005107252 -0.8978324 -1.653999            0    49
7:    NK cell 0.0001134945 0.0005107252 -0.9079530 -1.672644            0    49
    leadingEdge
         <list>
1: MS4A1, C....
2: HLA-DRA,....
3: RPL13, R....
4: CCL5, IL....
5: S100A6, ....
6: S100A4, ....
7: HCST, NK....

$`3`
      pathway         pval         padj         ES       NES nMoreExtreme  size
       <char>        <num>        <num>      <num>     <num>        <num> <int>
1:      cMono 0.0001004117 0.0004518526  0.9381483  1.498749            0    47
2:     ncMono 0.0001003210 0.0004518526  0.9029378  1.444777            0    49
3:        cDC 0.0242889765 0.0437201578  0.8453342  1.293574          233    17
4:        pDC 0.0976001607 0.1254859208  0.7224114  1.154096          971    47
5:     B cell 0.0232558140 0.0437201578 -0.5880374 -1.756967            0    47
6:    NK cell 0.0294117647 0.0441176471 -0.6368801 -1.914328            0    49
7: CD8 T cell 0.0029069767 0.0087209302 -0.9651183 -2.306945            0    18
    leadingEdge
         <list>
1: S100A9, ....
2: AIF1, S1....
3: HLA-DRA,....
4: CTSB, PL....
5: CXCR4, M....
6: GNLY, NK....
7: IL32, CC....

$`4`
       pathway         pval         padj         ES       NES nMoreExtreme
        <char>        <num>        <num>      <num>     <num>        <num>
1:  CD4 T cell 0.0012345679 0.0018518519  0.9814702  2.427369            0
2: Plasma cell 0.0659276546 0.0847641274 -0.7613848 -1.291510          564
3:     NK cell 0.0007634420 0.0017177446 -0.8072728 -1.471093            6
4:         cDC 0.0010765550 0.0018518519 -0.9051946 -1.479618            8
5:         pDC 0.0005488474 0.0016465423 -0.8181195 -1.484140            4
6:       cMono 0.0001097695 0.0004939627 -0.9109960 -1.652626            0
7:      ncMono 0.0001090631 0.0004939627 -0.9320652 -1.698503            0
    size  leadingEdge
   <int>       <list>
1:    50 IL7R, LD....
2:    24 UBE2J1, ....
3:    49 NKG7, GN....
4:    17 HLA-DRA,....
5:    47 PLEK, NP....
6:    47 S100A9, ....
7:    49 FCER1G, ....

$`5`
      pathway         pval         padj         ES       NES nMoreExtreme  size
       <char>        <num>        <num>      <num>     <num>        <num> <int>
1:    NK cell 0.0011402509 0.0025655644  0.9457207  2.188187            0    49
2: CD8 T cell 0.0007656968 0.0022970904  0.9791938  1.951587            0    18
3:     B cell 0.0466264399 0.0699396599 -0.7074609 -1.273585          424    47
4:        cDC 0.0167224080 0.0301003344 -0.8357465 -1.373437          144    17
5:     ncMono 0.0001095890 0.0004936917 -0.9096682 -1.643606            0    49
6:      cMono 0.0001097093 0.0004936917 -0.9166590 -1.650187            0    47
    leadingEdge
         <list>
1: NKG7, GN....
2: CCL5, GZ....
3: RPS11, C....
4: HLA-DRA,....
5: FTH1, FC....
6: S100A9, ....

$`6`
      pathway         pval         padj         ES       NES nMoreExtreme  size
       <char>        <num>        <num>      <num>     <num>        <num> <int>
1:    NK cell 0.0008658009 0.0015584416  0.9824343  2.182156            0    49
2: CD8 T cell 0.0099183197 0.0127521254  0.8797244  1.718261           16    18
3:        cDC 0.0037484885 0.0056227328 -0.8850041 -1.443850           30    17
4:     ncMono 0.0001130327 0.0002545249 -0.8306379 -1.500661            0    49
5:     B cell 0.0001130838 0.0002545249 -0.8683857 -1.564638            0    47
6:      cMono 0.0001130838 0.0002545249 -0.8816325 -1.588506            0    47
7: CD4 T cell 0.0001131222 0.0002545249 -0.9015195 -1.630264            0    50
    leadingEdge
         <list>
1: NKG7, GN....
2: CCL5, GZ....
3: HLA-DRA,....
4: COTL1, F....
5: RPL18A, ....
6: S100A9, ....
7: RPS12, R....

$`7`
       pathway         pval        padj         ES       NES nMoreExtreme  size
        <char>        <num>       <num>      <num>     <num>        <num> <int>
1: Plasma cell 0.0483046911 0.072457037 -0.7454380 -1.302320          415    24
2:     NK cell 0.0105061003 0.018910981 -0.7216614 -1.362524           92    49
3:         cDC 0.0056133786 0.012630102 -0.8595163 -1.445975           47    17
4:       cMono 0.0006795017 0.002038505 -0.7813280 -1.470055            5    47
5:      B cell 0.0006795017 0.002038505 -0.7830898 -1.473370            5    47
6:  CD4 T cell 0.0001127269 0.001014542 -0.9004861 -1.703752            0    50
    leadingEdge
         <list>
1: PPIB, RP....
2: ITGB2, N....
3: HLA-DRB1....
4: JUND, S1....
5: RPS23, R....
6: RPL34, R....

$`8`
       pathway         pval         padj         ES       NES nMoreExtreme
        <char>        <num>        <num>      <num>     <num>        <num>
1:     NK cell 0.0001069748 0.0004813864  0.9521049  1.741057            0
2:  CD4 T cell 0.0001066894 0.0004813864  0.8671427  1.589332            0
3:  CD8 T cell 0.0011807770 0.0032234957  0.9191795  1.535125            9
4: Plasma cell 0.0035562694 0.0045723463  0.8596421  1.482464           30
5:      ncMono 0.0030581040 0.0045723463 -0.6383894 -1.665998            1
6:         cDC 0.0031665611 0.0045723463 -0.8278322 -1.751560            4
7:       cMono 0.0014326648 0.0032234957 -0.8438412 -2.190169            0
    size  leadingEdge
   <int>       <list>
1:    49 NKG7, GN....
2:    50 RPL3, RP....
3:    18 CCL5, GZ....
4:    24 PPIB, FK....
5:    49 COTL1, A....
6:    17 HLA-DRA,....
7:    47 S100A9, ....

$`9`
       pathway         pval         padj         ES       NES nMoreExtreme
        <char>        <num>        <num>      <num>     <num>        <num>
1:       cMono 0.0001404494 0.0005285412  0.9521871  1.878727            0
2:      ncMono 0.0001397233 0.0005285412  0.9102892  1.805097            0
3:         cDC 0.0360740981 0.0405833604  0.8402793  1.440331          221
4: Plasma cell 0.0135658915 0.0174418605 -0.7790175 -1.564322           48
5:  CD8 T cell 0.0002622607 0.0005285412 -0.9233197 -1.759120            0
6:     NK cell 0.0003514938 0.0005285412 -0.8037175 -1.848796            0
7:      B cell 0.0003469813 0.0005285412 -0.8921524 -2.035759            0
8:  CD4 T cell 0.0003523608 0.0005285412 -0.9271993 -2.142847            0
    size  leadingEdge
   <int>       <list>
1:    47 S100A9, ....
2:    49 AIF1, S1....
3:    17 HLA-DRA,....
4:    24 ISG20, P....
5:    18 IL32, CC....
6:    49 NKG7, GN....
7:    47 CXCR4, R....
8:    50 RPL3, RP....

Selecting top significant overlap per cluster, we can now rename the clusters according to the predicted labels. OBS! Be aware that if you have some clusters that have non-significant p-values for all the gene sets, the cluster label will not be very reliable. Also, the gene sets you are using may not cover all the celltypes you have in your dataset and hence predictions may just be the most similar celltype. Also, some of the clusters have very similar p-values to multiple celltypes, for instance the ncMono and cMono celltypes are equally good for some clusters.

new.cluster.ids <- unlist(lapply(res, function(x) {
    as.data.frame(x)[1, 1]
}))

alldata$ref_gsea <- new.cluster.ids[as.character(alldata$leiden_k20)]

wrap_plots(
    plotReducedDim(alldata, dimred = "UMAP", colour_by = "leiden_k20"),
    plotReducedDim(alldata, dimred = "UMAP", colour_by = "ref_gsea"),
    ncol = 2
)

Compare the results with the other celltype prediction methods in the ctrl_13 sample.

ctrl.sce$ref_gsea <- alldata$ref_gsea[alldata$sample == "ctrl.13"]

wrap_plots(
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "ref_gsea"),
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "scmap_cell"),
    plotReducedDim(ctrl.sce, dimred = "UMAP", colour_by = "singler.hpca"),
    ncol = 3
)

7.2 With annotated gene sets

We have downloaded the celltype gene lists from http://bio-bigdata.hrbmu.edu.cn/CellMarker/CellMarker_download.html and converted the excel file to a csv for you. Read in the gene lists and do some filtering.

path_file <- file.path("data/human_cell_markers.txt")
if (!file.exists(path_file)) download.file(file.path(path_data, "human_cell_markers.txt"), destfile = path_file)
markers <- read.delim("data/human_cell_markers.txt")
markers <- markers[markers$speciesType == "Human", ]
markers <- markers[markers$cancerType == "Normal", ]

# Filter by tissue (to reduce computational time and have tissue-specific classification)
# sort(unique(markers$tissueType))
# grep("blood",unique(markers$tissueType),value = T)
# markers <- markers [ markers$tissueType %in% c("Blood","Venous blood",
#                                                "Serum","Plasma",
#                                                "Spleen","Bone marrow","Lymph node"), ]


# remove strange characters etc.
celltype_list <- lapply(unique(markers$cellName), function(x) {
    x <- paste(markers$geneSymbol[markers$cellName == x], sep = ",")
    x <- gsub("[[]|[]]| |-", ",", x)
    x <- unlist(strsplit(x, split = ","))
    x <- unique(x[!x %in% c("", "NA", "family")])
    x <- casefold(x, upper = T)
})
names(celltype_list) <- unique(markers$cellName)
# celltype_list <- lapply(celltype_list , function(x) {x[1:min(length(x),50)]} )
celltype_list <- celltype_list[unlist(lapply(celltype_list, length)) < 100]
celltype_list <- celltype_list[unlist(lapply(celltype_list, length)) > 5]
# run fgsea for each of the clusters in the list
res <- lapply(DGE_list, function(x) {
    x$logFC <- rowSums(as.matrix(x[, grep("logFC", colnames(x))]))
    gene_rank <- setNames(x$logFC, rownames(x))
    fgseaRes <- fgsea(pathways = celltype_list, stats = gene_rank, nperm = 10000)
    return(fgseaRes)
})
names(res) <- names(DGE_list)

# You can filter and resort the table based on ES, NES or pvalue
res <- lapply(res, function(x) {
    x[x$pval < 0.01, ]
})
res <- lapply(res, function(x) {
    x[x$size > 5, ]
})
res <- lapply(res, function(x) {
    x[order(x$NES, decreasing = T), ]
})

# show top 3 for each cluster.
lapply(res, head, 3)
$`1`
                           pathway        pval       padj        ES      NES
                            <char>       <num>      <num>     <num>    <num>
1:                      Neutrophil 0.001517451 0.05705615 0.8930496 2.257529
2:          CD1C+_B dendritic cell 0.001090513 0.05705615 0.9386464 2.235849
3: Monocyte derived dendritic cell 0.001367054 0.05705615 0.9177964 1.885585
   nMoreExtreme  size  leadingEdge
          <num> <int>       <list>
1:            0    82 S100A8, ....
2:            0    54 S100A8, ....
3:            1    18 S100A8, ....

$`10`
                        pathway       pval       padj         ES       NES
                         <char>      <num>      <num>      <num>     <num>
1:          Natural killer cell 0.00850085 0.05707714 -0.6955236 -1.202945
2: Megakaryocyte erythroid cell 0.00840084 0.05707714 -0.6957935 -1.203071
3:        CD4+ cytotoxic T cell 0.00360036 0.03981575 -0.7042291 -1.218833
   nMoreExtreme  size  leadingEdge
          <num> <int>       <list>
1:           84    84 PTPRC, C....
2:           83    83 PTPRC, C....
3:           35    86 RAP1B, Z....

$`11`
            pathway         pval       padj        ES      NES nMoreExtreme
             <char>        <num>      <num>     <num>    <num>        <num>
1: Mesenchymal cell 0.0005168626 0.05886446 0.8164307 1.631806            3
2:     Stromal cell 0.0012524353 0.05886446 0.8477874 1.602625            8
3:       Neutrophil 0.0011059228 0.05886446 0.7669280 1.578995            8
    size  leadingEdge
   <int>       <list>
1:    61 COTL1, S....
2:    38 PECAM1, ....
3:    82 LST1, FC....

$`12`
             pathway         pval        padj        ES      NES nMoreExtreme
              <char>        <num>       <num>     <num>    <num>        <num>
1: Naive CD4+ T cell 0.0001031353 0.009694719 0.8809675 1.490760            0
2:       CD4+ T cell 0.0003153911 0.014823381 0.8931280 1.484227            2
3: Naive CD8+ T cell 0.0001001904 0.009694719 0.8320149 1.456091            0
    size  leadingEdge
   <int>       <list>
1:    34 IL7R, RP....
2:    25 IL7R, LT....
3:    91 LDHB, NP....

$`13`
                    pathway        pval       padj         ES       NES
                     <char>       <num>      <num>      <num>     <num>
1:        Follicular B cell 0.001935557 0.02722468  0.8788086  1.500461
2: Morula cell (Blastomere) 0.006952965 0.05027529  0.7111727  1.345584
3:               Myeloblast 0.009193481 0.06401387 -0.9435484 -1.614268
   nMoreExtreme  size  leadingEdge
          <num> <int>       <list>
1:           16    22 MS4A1, C....
2:           67    88 RPL39, R....
3:           21     6 CSF3R, L....

$`2`
                        pathway        pval       padj         ES       NES
                         <char>       <num>      <num>      <num>     <num>
1:                  Acinar cell 0.008064516 0.06317204 -0.7542708 -1.406864
2:                  FOXN4+ cell 0.005870625 0.05973237 -0.7383047 -1.412975
3: Hematopoietic precursor cell 0.007874016 0.06317204 -0.9638801 -1.413231
   nMoreExtreme  size  leadingEdge
          <num> <int>       <list>
1:           70    57 LYZ, IL3....
2:           52    75 LYZ, ENO....
3:           58     6  PTPRC, CD14

$`3`
                           pathway         pval        padj        ES      NES
                            <char>        <num>       <num>     <num>    <num>
1:                      Neutrophil 0.0001000200 0.006311478 0.9045921 1.466125
2: Monocyte derived dendritic cell 0.0002077059 0.009762177 0.9394712 1.443466
3:                      Glial cell 0.0005325948 0.014303975 0.9605309 1.438322
   nMoreExtreme  size  leadingEdge
          <num> <int>       <list>
1:            0    82 S100A9, ....
2:            1    18 S100A9, ....
3:            4    12 CD14, VI....

$`4`
             pathway         pval       padj        ES      NES nMoreExtreme
              <char>        <num>      <num>     <num>    <num>        <num>
1: Naive CD8+ T cell 0.0025773196 0.04037621 0.8307127 2.294368            0
2: Naive CD4+ T cell 0.0009354537 0.02198316 0.9078407 2.109190            0
3:       CD4+ T cell 0.0007645260 0.02164153 0.9222369 2.034293            0
    size  leadingEdge
   <int>       <list>
1:    91 LDHB, PI....
2:    34 IL7R, TC....
3:    25 LTB, IL7....

$`5`
                 pathway         pval       padj        ES      NES
                  <char>        <num>      <num>     <num>    <num>
1: CD4+ cytotoxic T cell 0.0018050542 0.03305785 0.8915720 2.162451
2:   Natural killer cell 0.0017513135 0.03305785 0.7741180 1.878243
3:           CD8+ T cell 0.0007911392 0.03305785 0.9352951 1.860485
   nMoreExtreme  size  leadingEdge
          <num> <int>       <list>
1:            0    86 NKG7, CC....
2:            0    84 NKG7, GN....
3:            0    19 NKG7, CD....

$`6`
                             pathway        pval       padj        ES      NES
                              <char>       <num>      <num>     <num>    <num>
1:             CD4+ cytotoxic T cell 0.001158749 0.03500753 0.9516604 2.265915
2: Effector CD8+ memory T (Tem) cell 0.001111111 0.03500753 0.8866221 2.084440
3:               Natural killer cell 0.001149425 0.03500753 0.8111768 1.926784
   nMoreExtreme  size  leadingEdge
          <num> <int>       <list>
1:            0    86 NKG7, GN....
2:            0    79 GNLY, GZ....
3:            0    84 NKG7, GN....

$`7`
            pathway        pval       padj         ES       NES nMoreExtreme
             <char>       <num>      <num>      <num>     <num>        <num>
1:    Megakaryocyte 0.001439885 0.09023278  0.8669567  1.844022            1
2:         Platelet 0.005833333 0.18277778  0.7101754  1.690124            6
3: Adventitial cell 0.008125224 0.19090811 -0.9370995 -1.420829           67
    size  leadingEdge
   <int>       <list>
1:    26 PPBP, PF....
2:    45 GP9, ITG....
3:     7  PTPRC, CD44

$`8`
                             pathway         pval        padj        ES
                              <char>        <num>       <num>     <num>
1:             CD4+ cytotoxic T cell 0.0001029760 0.006481865 0.9079269
2: Effector CD8+ memory T (Tem) cell 0.0001034340 0.006481865 0.8889122
3:               Natural killer cell 0.0001030291 0.006481865 0.8291171
        NES nMoreExtreme  size  leadingEdge
      <num>        <num> <int>       <list>
1: 1.724108            0    86 NKG7, GN....
2: 1.681468            0    79 GNLY, GZ....
3: 1.573421            0    84 NKG7, GN....

$`9`
                  pathway         pval       padj        ES      NES
                   <char>        <num>      <num>     <num>    <num>
1:             Neutrophil 0.0001294666 0.01294409 0.9122455 1.918616
2: CD1C+_B dendritic cell 0.0001377031 0.01294409 0.9192254 1.844511
3:           Stromal cell 0.0008788633 0.04130658 0.8697282 1.660327
   nMoreExtreme  size  leadingEdge
          <num> <int>       <list>
1:            0    82 S100A9, ....
2:            0    54 S100A9, ....
3:            5    38 VIM, TIM....

#CT_GSEA8:

new.cluster.ids <- unlist(lapply(res, function(x) {
    as.data.frame(x)[1, 1]
}))
alldata$cellmarker_gsea <- new.cluster.ids[as.character(alldata$leiden_k20)]

wrap_plots(
    plotReducedDim(alldata, dimred = "UMAP", colour_by = "cellmarker_gsea"),
    plotReducedDim(alldata, dimred = "UMAP", colour_by = "ref_gsea"),
    ncol = 2
)

Discuss

Do you think that the methods overlap well? Where do you see the most inconsistencies?

In this case we do not have any ground truth, and we cannot say which method performs best. You should keep in mind, that any celltype classification method is just a prediction, and you still need to use your common sense and knowledge of the biological system to judge if the results make sense.

Finally, lets save the data with predictions.

saveRDS(ctrl.sce, "data/covid/results/bioc_covid_qc_dr_int_cl_ct-ctrl13.rds")

8 Session info

Click here
sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Big Sur ... 10.16

Matrix products: default
BLAS/LAPACK: /Users/asabjor/miniconda3/envs/seurat5/lib/libopenblasp-r0.3.27.dylib;  LAPACK version 3.12.0

locale:
[1] sv_SE.UTF-8/sv_SE.UTF-8/sv_SE.UTF-8/C/sv_SE.UTF-8/sv_SE.UTF-8

time zone: Europe/Stockholm
tzcode source: system (macOS)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] fgsea_1.28.0                celldex_1.12.0             
 [3] Seurat_5.1.0                SeuratObject_5.0.2         
 [5] sp_2.1-4                    SingleR_2.4.0              
 [7] scmap_1.24.0                scPred_1.9.2               
 [9] pheatmap_1.0.12             patchwork_1.2.0            
[11] dplyr_1.1.4                 scran_1.30.0               
[13] scater_1.30.1               ggplot2_3.5.1              
[15] scuttle_1.12.0              SingleCellExperiment_1.24.0
[17] SummarizedExperiment_1.32.0 Biobase_2.62.0             
[19] GenomicRanges_1.54.1        GenomeInfoDb_1.38.1        
[21] IRanges_2.36.0              S4Vectors_0.40.2           
[23] BiocGenerics_0.48.1         MatrixGenerics_1.14.0      
[25] matrixStats_1.4.1          

loaded via a namespace (and not attached):
  [1] spatstat.sparse_3.1-0         bitops_1.0-8                 
  [3] lubridate_1.9.3               httr_1.4.7                   
  [5] RColorBrewer_1.1-3            tools_4.3.3                  
  [7] sctransform_0.4.1             utf8_1.2.4                   
  [9] R6_2.5.1                      lazyeval_0.2.2               
 [11] uwot_0.1.16                   withr_3.0.1                  
 [13] gridExtra_2.3                 progressr_0.14.0             
 [15] cli_3.6.3                     spatstat.explore_3.2-6       
 [17] fastDummies_1.7.4             labeling_0.4.3               
 [19] spatstat.data_3.1-2           randomForest_4.7-1.2         
 [21] proxy_0.4-27                  ggridges_0.5.6               
 [23] pbapply_1.7-2                 harmony_1.2.1                
 [25] parallelly_1.38.0             limma_3.58.1                 
 [27] RSQLite_2.3.7                 FNN_1.1.4                    
 [29] generics_0.1.3                ica_1.0-3                    
 [31] spatstat.random_3.2-3         Matrix_1.6-5                 
 [33] ggbeeswarm_0.7.2              fansi_1.0.6                  
 [35] abind_1.4-5                   lifecycle_1.0.4              
 [37] yaml_2.3.10                   edgeR_4.0.16                 
 [39] BiocFileCache_2.10.1          recipes_1.1.0                
 [41] SparseArray_1.2.2             Rtsne_0.17                   
 [43] blob_1.2.4                    grid_4.3.3                   
 [45] promises_1.3.0                dqrng_0.3.2                  
 [47] ExperimentHub_2.10.0          crayon_1.5.3                 
 [49] miniUI_0.1.1.1                lattice_0.22-6               
 [51] beachmat_2.18.0               cowplot_1.1.3                
 [53] KEGGREST_1.42.0               pillar_1.9.0                 
 [55] knitr_1.48                    metapod_1.10.0               
 [57] future.apply_1.11.2           codetools_0.2-20             
 [59] fastmatch_1.1-4               leiden_0.4.3.1               
 [61] googleVis_0.7.3               glue_1.7.0                   
 [63] data.table_1.15.4             vctrs_0.6.5                  
 [65] png_0.1-8                     spam_2.10-0                  
 [67] gtable_0.3.5                  cachem_1.1.0                 
 [69] gower_1.0.1                   xfun_0.47                    
 [71] S4Arrays_1.2.0                mime_0.12                    
 [73] prodlim_2024.06.25            survival_3.7-0               
 [75] timeDate_4032.109             iterators_1.0.14             
 [77] hardhat_1.4.0                 lava_1.8.0                   
 [79] statmod_1.5.0                 bluster_1.12.0               
 [81] interactiveDisplayBase_1.40.0 fitdistrplus_1.2-1           
 [83] ROCR_1.0-11                   ipred_0.9-15                 
 [85] nlme_3.1-165                  bit64_4.0.5                  
 [87] filelock_1.0.3                RcppAnnoy_0.0.22             
 [89] irlba_2.3.5.1                 vipor_0.4.7                  
 [91] KernSmooth_2.23-24            rpart_4.1.23                 
 [93] DBI_1.2.3                     colorspace_2.1-1             
 [95] nnet_7.3-19                   tidyselect_1.2.1             
 [97] curl_5.2.1                    bit_4.0.5                    
 [99] compiler_4.3.3                BiocNeighbors_1.20.0         
[101] DelayedArray_0.28.0           plotly_4.10.4                
[103] scales_1.3.0                  lmtest_0.9-40                
[105] rappdirs_0.3.3                stringr_1.5.1                
[107] digest_0.6.37                 goftest_1.2-3                
[109] spatstat.utils_3.1-0          rmarkdown_2.28               
[111] XVector_0.42.0                htmltools_0.5.8.1            
[113] pkgconfig_2.0.3               sparseMatrixStats_1.14.0     
[115] dbplyr_2.5.0                  fastmap_1.2.0                
[117] rlang_1.1.4                   htmlwidgets_1.6.4            
[119] shiny_1.9.1                   DelayedMatrixStats_1.24.0    
[121] farver_2.1.2                  zoo_1.8-12                   
[123] jsonlite_1.8.8                BiocParallel_1.36.0          
[125] ModelMetrics_1.2.2.2          BiocSingular_1.18.0          
[127] RCurl_1.98-1.16               magrittr_2.0.3               
[129] GenomeInfoDbData_1.2.11       dotCall64_1.1-1              
[131] munsell_0.5.1                 Rcpp_1.0.13                  
[133] viridis_0.6.5                 reticulate_1.39.0            
[135] stringi_1.8.4                 pROC_1.18.5                  
[137] zlibbioc_1.48.0               MASS_7.3-60.0.1              
[139] AnnotationHub_3.10.0          plyr_1.8.9                   
[141] parallel_4.3.3                listenv_0.9.1                
[143] ggrepel_0.9.6                 deldir_2.0-4                 
[145] Biostrings_2.70.1             splines_4.3.3                
[147] tensor_1.5                    locfit_1.5-9.9               
[149] igraph_2.0.3                  spatstat.geom_3.2-9          
[151] RcppHNSW_0.6.0                reshape2_1.4.4               
[153] ScaledMatrix_1.10.0           BiocVersion_3.18.1           
[155] evaluate_0.24.0               BiocManager_1.30.25          
[157] foreach_1.5.2                 httpuv_1.6.15                
[159] RANN_2.6.2                    tidyr_1.3.1                  
[161] purrr_1.0.2                   polyclip_1.10-7              
[163] future_1.34.0                 scattermore_1.2              
[165] rsvd_1.0.5                    xtable_1.8-4                 
[167] e1071_1.7-14                  RSpectra_0.16-2              
[169] later_1.3.2                   viridisLite_0.4.2            
[171] class_7.3-22                  tibble_3.2.1                 
[173] memoise_2.0.1                 AnnotationDbi_1.64.1         
[175] beeswarm_0.4.0                cluster_2.1.6                
[177] timechange_0.3.0              globals_0.16.3               
[179] caret_6.0-94