Functional Analysis

Single Cell RNA-Seq Analysis

Roy Francis, Paulo Czarnewski

09-Feb-2024

Workflow

  • Quality control
  • Cell cycle phase classification
  • Normalization
  • Select highly variable genes
  • Data integration
  • Clustering
  • Cell typing
  • Differential gene expression
  • Functional analysis (GSA/GSEA)

Many names for functional analysis

  • Functional analysis
  • Pathway analysis
  • Gene set analyses (GSA)
  • Gene set enrichment analysis (GSEA)
  • Gene list enrichment analysis
  • Go term analysis
  • Over representation analysis
  • Hypergeometric test

What is functional analysis?

Gene-level data -> Gene set data

We focus on transcriptiomics and DGE, but in principle applies to any genome-wide data

Why functional analysis?

  • Make sense of long list of DEGs
  • What is the function of those genes?
  • What is the biological consequence of over/under expression of genes?
  • Connect your DEGs and thereby your experiment to pathway activity
  • Top genes in itself might be be the most interesting inference
  • Less prone to false positive DEGs

Gene sets


Gene Ontology (GO)


Kyoto encyclopedia of genes and genomes (KEGG)


Reactome


Wikipathways


Molecular signatures database (MSigDB)

Gene ontology

  • Network graph, loosely hierarchical
  • Disjoint graphs
    • Biological process (Neutrophil Chemotaxis, Cell proliferation)
    • Molecular Function (Histone acetylation, Phosphorylation)
    • Cellular compartment (Nucleus, Cytoplasm, Plasma membrane)
  • Genes can belong to multiple terms

Kegg

  • Fewer and smaller ontology
  • Better curated
  • Metabolic pathways

Pathview

Tools

Online

Code

Overrepresentation analysis (ORA/GSA)

  • Hypergeometric test (Fisher’s exact test)

Overrepresentation analysis (ORA/GSA)

  • Background can be all genes or all genes expressed in your cell population
  • Requires arbitrary cut-off
  • Omits actual gene-level statistics
  • Computationally fast
  • Generally works for few genes with strong effects

Gene set enrichment analysis (GSEA)

Subramanian et al. (2005)

Gene set enrichment analysis (GSEA)

  • Enrichment score (ES)
  • Normalized enrichment score (NES)
  • No need for cut-offs
  • Takes gene-level stats into account
  • More sensitive to subtle changes

GSEA User Guide

Considerations

  • Bias in gene sets
  • Multifunctional genes, set size
  • Well studied topics/diseases will be overrepresented (eg. covid)
  • Translation of gene ids
  • Changes in database
  • Incomplete information for your organism
  • Critical evaluation is required

References

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43), 15545–15550. https://www.pnas.org/doi/abs/10.1073/pnas.0506580102

Acknowledgements

Adapted from previous presentations by Leif Wigge & Paulo Czarnewski.