Introduction

In this document we will illustrate the use of several automatic cell type annotation packages. We will use an example data set consisting of 2,700 PBMCs, sequenced using 10x Genomics technology.

Load packages

We first load the required R packages.

Load and preprocess data

Next, we load the data (from the TENxPBMCData package), calculate QC metrics with scater, normalize and run denoising PCA with r Biocpkg("scran") and apply tSNE using scater, as we have seen previously in the course. We also apply a graph-based community detection algorithm in order to partition the cells into discrete clusters.

## snapshotDate(): 2019-09-25
## see ?TENxPBMCData and browseVignettes('TENxPBMCData') for documentation
## downloading 0 resources
## loading from cache

CellAssign

CellAssign (Zhang et al. 2019) takes as input a single-cell data set (as a SingleCellExperiment object) and a collection of marker genes for cell types of interest. The method builds on the assumption that a cell from a certain cell type should display a high expression of the marker genes for that cell type. For illustration, here we are using a set of marker genes for cell types in a tumour microenvironment, provided with the CellAssign package.

##          B cells T cells Cytotoxic T cells Monocyte/Macrophage
## ACTA2          0       0                 0                   0
## CD14           0       0                 0                   1
## CD19           1       0                 0                   0
## CD2            0       1                 1                   0
## CD24           1       0                 0                   0
## CD28           0       1                 1                   0
## CD33           0       0                 0                   1
## CD3D           0       1                 1                   0
## CD3E           0       1                 1                   0
## CD3G           0       1                 1                   0
## CD4            0       1                 0                   1
## CD79A          1       0                 0                   0
## CD8A           0       0                 1                   0
## CDH1           0       0                 0                   0
## CDH5           0       0                 0                   0
## CLDN3          0       0                 0                   0
## CLDN4          0       0                 0                   0
## CLEC14A        0       0                 0                   0
## COL1A1         0       0                 0                   0
## COL3A1         0       0                 0                   0
## EMCN           0       0                 0                   0
## EPCAM          0       0                 0                   0
## FCGR3A         0       0                 0                   1
## GZMA           0       0                 1                   0
## ITGAM          0       0                 0                   1
## ITGAX          0       0                 0                   1
## LYZ            0       0                 0                   1
## MCAM           0       0                 0                   0
## MS4A1          1       0                 0                   0
## MYH11          0       0                 0                   0
## MYLK           0       0                 0                   0
## NKG7           0       0                 1                   0
## PECAM1         0       0                 0                   0
## PLN            0       0                 0                   0
## PRF1           0       0                 1                   0
## PTPRC          1       1                 1                   1
## SERPINH1       0       0                 0                   0
## VIM            1       1                 1                   1
## VWF            0       0                 0                   0
##          Epithelial cells Myofibroblast Vascular smooth muscle cells
## ACTA2                   0             1                            1
## CD14                    0             0                            0
## CD19                    0             0                            0
## CD2                     0             0                            0
## CD24                    0             0                            0
## CD28                    0             0                            0
## CD33                    0             0                            0
## CD3D                    0             0                            0
## CD3E                    0             0                            0
## CD3G                    0             0                            0
## CD4                     0             0                            0
## CD79A                   0             0                            0
## CD8A                    0             0                            0
## CDH1                    1             0                            0
## CDH5                    0             0                            0
## CLDN3                   1             0                            0
## CLDN4                   1             0                            0
## CLEC14A                 0             0                            0
## COL1A1                  0             1                            1
## COL3A1                  0             1                            1
## EMCN                    0             0                            0
## EPCAM                   1             0                            0
## FCGR3A                  0             0                            0
## GZMA                    0             0                            0
## ITGAM                   0             0                            0
## ITGAX                   0             0                            0
## LYZ                     0             0                            0
## MCAM                    0             0                            1
## MS4A1                   0             0                            0
## MYH11                   0             0                            1
## MYLK                    0             0                            1
## NKG7                    0             0                            0
## PECAM1                  0             0                            0
## PLN                     0             0                            1
## PRF1                    0             0                            0
## PTPRC                   0             0                            0
## SERPINH1                0             1                            1
## VIM                     0             1                            1
## VWF                     0             0                            0
##          Endothelial cells other
## ACTA2                    0     0
## CD14                     0     0
## CD19                     0     0
## CD2                      0     0
## CD24                     0     0
## CD28                     0     0
## CD33                     0     0
## CD3D                     0     0
## CD3E                     0     0
## CD3G                     0     0
## CD4                      0     0
## CD79A                    0     0
## CD8A                     0     0
## CDH1                     0     0
## CDH5                     1     0
## CLDN3                    0     0
## CLDN4                    0     0
## CLEC14A                  1     0
## COL1A1                   0     0
## COL3A1                   0     0
## EMCN                     1     0
## EPCAM                    0     0
## FCGR3A                   0     0
## GZMA                     0     0
## ITGAM                    0     0
## ITGAX                    0     0
## LYZ                      0     0
## MCAM                     1     0
## MS4A1                    0     0
## MYH11                    0     0
## MYLK                     0     0
## NKG7                     0     0
## PECAM1                   1     0
## PLN                      0     0
## PRF1                     0     0
## PTPRC                    0     0
## SERPINH1                 1     0
## VIM                      1     0
## VWF                      1     0
## Warning in cellassign(exprs_obj = sce[shared, ], marker_gene_info =
## marker_gene_mat[shared, : Genes with no mapping counts are present. Make
## sure this is expected -- this can be valid input in some cases (e.g. when
## cell types are overspecified).
## Warning in cellassign(exprs_obj = sce[shared, ], marker_gene_info =
## marker_gene_mat[shared, : Cells with no mapping counts are present. You
## might want to filter these out prior to using cellassign.
## [1] "Cytotoxic T cells"   "B cells"             "Cytotoxic T cells"  
## [4] "Monocyte/Macrophage" "Cytotoxic T cells"   "Cytotoxic T cells"
##           B cells      T cells Cytotoxic T cells Monocyte/Macrophage
## [1,] 5.682052e-06 2.459347e-14      9.999916e-01        2.662367e-06
## [2,] 9.999999e-01 9.120164e-21      3.043269e-08        3.642928e-08
## [3,] 3.546612e-11 8.718447e-13      1.000000e+00        1.156666e-11
## [4,] 1.595091e-06 1.574999e-18      1.262782e-05        9.999855e-01
## [5,] 2.060520e-10 4.386193e-23      1.000000e+00        2.301922e-11
## [6,] 8.537414e-04 4.776439e-13      9.979890e-01        1.152145e-03
##      Epithelial cells Myofibroblast Vascular smooth muscle cells
## [1,]     7.628969e-12  1.225058e-11                 6.193463e-12
## [2,]     6.350016e-15  1.889068e-13                 7.942218e-14
## [3,]     5.034158e-17  6.953427e-17                 3.054440e-17
## [4,]     1.972433e-11  5.421507e-12                 2.548232e-12
## [5,]     6.984863e-16  1.023776e-16                 5.573620e-17
## [6,]     3.983346e-10  2.236500e-08                 1.125137e-08
##      Endothelial cells        other
## [1,]      1.373205e-11 9.651955e-08
## [2,]      2.344012e-13 8.973982e-11
## [3,]      8.437968e-17 6.926728e-13
## [4,]      6.342749e-12 2.605581e-07
## [5,]      1.094129e-16 8.465151e-12
## [6,]      2.514499e-08 5.054143e-06

SingleR

SingleR (Aran et al. 2019) classifies cells to cell types by comparing them to full reference expression profiles. Thus, rather than a set of marker genes it requires complete expression matrices as reference input. Here we use three different reference immune data sets that are provided with the SingleR package for illustration.

## snapshotDate(): 2019-09-25
## see ?SingleR and browseVignettes('SingleR') for documentation
## downloading 0 resources
## loading from cache
## see ?SingleR and browseVignettes('SingleR') for documentation
## downloading 0 resources
## loading from cache
## snapshotDate(): 2019-09-25
## see ?SingleR and browseVignettes('SingleR') for documentation
## downloading 0 resources
## loading from cache
## see ?SingleR and browseVignettes('SingleR') for documentation
## downloading 0 resources
## loading from cache
## snapshotDate(): 2019-09-25
## see ?SingleR and browseVignettes('SingleR') for documentation
## downloading 0 resources
## loading from cache
## see ?SingleR and browseVignettes('SingleR') for documentation
## downloading 0 resources
## loading from cache
##                
##                 B-cells CD4+ T-cells CD8+ T-cells HSC Monocytes NK cells
##   B cells            98            0            0   0         0        0
##   Monocytes           0            0            0   0       188        0
##   NK cells            0            0           38   0         0       56
##   T cells, CD4+       0          133          132   0         0        0
##   T cells, CD8+       0           14           73   0         0        0
##                
##                 B_cell CMP Monocyte NK_cell Platelets Pre-B_cell_CD34-
##   B cells           98   0        0       0         0                0
##   Monocytes          0   0      182       0         0                6
##   NK cells           0   0        0      53         0                1
##   T cells, CD4+      0   0        0       0         0                0
##   T cells, CD8+      0   0        0       0         0                0
##                
##                 T_cells
##   B cells             0
##   Monocytes           0
##   NK cells           39
##   T cells, CD4+     268
##   T cells, CD8+      89

AUCell

AUCell (Aibar et al. 2017) calculates “activation scores” for gene sets (which could, for example, consist of marker genes for specific cell types). An important difference to the previous methods is that a cell can be “assigned” to multiple reference gene sets, and not all cells are “assigned” to any reference gene set. To obtain gene sets for assigning cells to cell types, we perform pairwise t-tests on one of the reference data sets from SingleR, and use scran to extract only marker genes that are upregulated in one cell type compared to all the others.

## snapshotDate(): 2019-09-25
## see ?SingleR and browseVignettes('SingleR') for documentation
## downloading 0 resources
## loading from cache
## see ?SingleR and browseVignettes('SingleR') for documentation
## downloading 0 resources
## loading from cache
##       B cells     Monocytes      NK cells T cells, CD4+ T cells, CD8+ 
##          1058           984           680            83           101
## Quantiles for the number of genes detected by cell: 
## (Non-detected genes are shuffled at the end of the ranking. Keep it in mind when choosing the threshold for calculating the AUC).
##     min      1%      5%     10%     50%    100% 
##  246.00  346.98  461.25  542.00  820.00 2815.00
## Using 2 cores.

## Genes in the gene sets NOT available in the dataset: 
##  B cells:    332 (31% of 1058)
##  Monocytes:  97 (10% of 984)
##  NK cells:   104 (15% of 680)
##  T cells, CD4+:  12 (14% of 83)
##  T cells, CD8+:  15 (15% of 101)
## Warning in .auc_assignmnetThreshold_v6(aucRow = aucMatrix[gSetName, , drop
## = FALSE], : NK cells: Check the AUC histogram. 'minimumDens' was selected
## as the best threshold, but there might be several distributions in the AUC.

##     cell geneSet
## 1  Cell2 B cells
## 2 Cell11 B cells
## 3 Cell21 B cells
## 4 Cell22 B cells
## 5 Cell23 B cells
## 6 Cell26 B cells
##                
##                 Cell109 Cell11
##   B cells             1      1
##   Monocytes           0      0
##   NK cells            0      0
##   T cells, CD4+       0      0
##   T cells, CD8+       0      0

Session info

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] doRNG_1.7.1                 rngtools_1.4               
##  [3] pkgmaker_0.27               registry_0.5-1             
##  [5] foreach_1.4.7               AUCell_1.6.1               
##  [7] SingleR_0.99.13             cellassign_0.99.11         
##  [9] edgeR_3.27.13               limma_3.41.17              
## [11] igraph_1.2.4.1              Seurat_3.1.1               
## [13] pheatmap_1.0.12             BiocSingular_1.1.7         
## [15] scran_1.13.26               scater_1.13.25             
## [17] ggplot2_3.2.1               TENxPBMCData_1.3.0         
## [19] HDF5Array_1.13.9            rhdf5_2.29.3               
## [21] SingleCellExperiment_1.7.11 SummarizedExperiment_1.15.9
## [23] DelayedArray_0.11.8         BiocParallel_1.19.3        
## [25] matrixStats_0.55.0          Biobase_2.45.1             
## [27] GenomicRanges_1.37.16       GenomeInfoDb_1.21.2        
## [29] IRanges_2.19.16             S4Vectors_0.23.25          
## [31] BiocGenerics_0.31.6         BiocStyle_2.13.2           
## 
## loaded via a namespace (and not attached):
##   [1] reticulate_1.13               R.utils_2.9.0                
##   [3] tidyselect_0.2.5              RSQLite_2.1.2                
##   [5] AnnotationDbi_1.47.1          htmlwidgets_1.5.1            
##   [7] grid_3.6.1                    Rtsne_0.15                   
##   [9] munsell_0.5.0                 codetools_0.2-16             
##  [11] ica_1.0-2                     statmod_1.4.32               
##  [13] future_1.14.0                 withr_2.1.2                  
##  [15] colorspace_1.4-1              knitr_1.25                   
##  [17] ROCR_1.0-7                    tensorflow_2.0.0             
##  [19] gbRd_0.4-11                   listenv_0.7.0                
##  [21] labeling_0.3                  Rdpack_0.11-0                
##  [23] GenomeInfoDbData_1.2.1        bit64_0.9-7                  
##  [25] vctrs_0.2.0                   xfun_0.10                    
##  [27] BiocFileCache_1.9.1           doParallel_1.0.15            
##  [29] R6_2.4.0                      ggbeeswarm_0.6.0             
##  [31] rsvd_1.0.2                    locfit_1.5-9.1               
##  [33] bitops_1.0-6                  assertthat_0.2.1             
##  [35] promises_1.1.0                SDMTools_1.1-221.1           
##  [37] scales_1.0.0                  beeswarm_0.2.3               
##  [39] gtable_0.3.0                  beachmat_2.1.2               
##  [41] npsurv_0.4-0                  globals_0.12.4               
##  [43] rlang_0.4.0                   zeallot_0.1.0                
##  [45] splines_3.6.1                 lazyeval_0.2.2               
##  [47] BiocManager_1.30.7            yaml_2.2.0                   
##  [49] reshape2_1.4.3                backports_1.1.5              
##  [51] httpuv_1.5.2                  tools_3.6.1                  
##  [53] gplots_3.0.1.1                RColorBrewer_1.1-2           
##  [55] ggridges_0.5.1                Rcpp_1.0.2                   
##  [57] plyr_1.8.4                    base64enc_0.1-3              
##  [59] zlibbioc_1.31.0               purrr_0.3.2                  
##  [61] RCurl_1.95-4.12               pbapply_1.4-2                
##  [63] viridis_0.5.1                 cowplot_1.0.0                
##  [65] zoo_1.8-6                     ggrepel_0.8.1                
##  [67] cluster_2.1.0                 magrittr_1.5                 
##  [69] data.table_1.12.4             lmtest_0.9-37                
##  [71] RANN_2.6.1                    whisker_0.4                  
##  [73] fitdistrplus_1.0-14           lsei_1.2-0                   
##  [75] mime_0.7                      evaluate_0.14                
##  [77] xtable_1.8-4                  XML_3.98-1.20                
##  [79] gridExtra_2.3                 tfruns_1.4                   
##  [81] compiler_3.6.1                tibble_2.1.3                 
##  [83] KernSmooth_2.23-15            crayon_1.3.4                 
##  [85] R.oo_1.22.0                   htmltools_0.4.0              
##  [87] segmented_1.0-0               later_1.0.0                  
##  [89] tidyr_1.0.0                   RcppParallel_4.4.4           
##  [91] DBI_1.0.0                     ExperimentHub_1.11.6         
##  [93] dbplyr_1.4.2                  MASS_7.3-51.4                
##  [95] rappdirs_0.3.1                Matrix_1.2-17                
##  [97] R.methodsS3_1.7.1             gdata_2.18.0                 
##  [99] metap_1.1                     pkgconfig_2.0.3              
## [101] plotly_4.9.0                  annotate_1.63.0              
## [103] vipor_0.4.5                   dqrng_0.2.1                  
## [105] XVector_0.25.0                bibtex_0.4.2                 
## [107] stringr_1.4.0                 digest_0.6.21                
## [109] sctransform_0.2.0             RcppAnnoy_0.0.13             
## [111] tsne_0.1-3                    graph_1.63.0                 
## [113] rmarkdown_1.16                leiden_0.3.1                 
## [115] uwot_0.1.4                    GSEABase_1.47.0              
## [117] DelayedMatrixStats_1.7.2      curl_4.2                     
## [119] shiny_1.4.0                   gtools_3.8.1                 
## [121] lifecycle_0.1.0               nlme_3.1-141                 
## [123] jsonlite_1.6                  Rhdf5lib_1.7.5               
## [125] BiocNeighbors_1.3.5           viridisLite_0.3.0            
## [127] pillar_1.4.2                  lattice_0.20-38              
## [129] fastmap_1.0.1                 httr_1.4.1                   
## [131] survival_2.44-1.1             interactiveDisplayBase_1.23.0
## [133] glue_1.3.1                    iterators_1.0.12             
## [135] png_0.1-7                     bit_1.1-14                   
## [137] mixtools_1.1.0                stringi_1.4.3                
## [139] blob_1.2.0                    AnnotationHub_2.17.10        
## [141] caTools_1.17.1.2              memoise_1.1.0                
## [143] dplyr_0.8.3                   irlba_2.3.3                  
## [145] future.apply_1.3.0            ape_5.3

References

Aibar, Sara, Carmen Bravo González-Blas, Thomas Moerman, Vân Anh Huynh-Thu, Hana Imrichova, Gert Hulselmans, Florian Rambow, et al. 2017. “SCENIC: Single-Cell Regulatory Network Inference and Clustering.” Nat. Methods 14 (11): 1083–6.

Aran, Dvir, Agnieszka P Looney, Leqian Liu, Esther Wu, Valerie Fong, Austin Hsu, Suzanna Chak, et al. 2019. “Reference-Based Analysis of Lung Single-Cell Sequencing Reveals a Transitional Profibrotic Macrophage.” Nat. Immunol. 20 (2): 163–72.

Zhang, Allen W, Ciara O’Flanagan, Elizabeth A Chavez, Jamie L P Lim, Nicholas Ceglia, Andrew McPherson, Matt Wiens, et al. 2019. “Probabilistic Cell-Type Assignment of Single-Cell RNA-seq for Tumor Microenvironment Profiling.” Nat. Methods 16 (10): 1007–15.