1 Introduction

1.1 Introduction

High-throughput biological datasets, such as transcriptomic, proteomic, or single-cell data, are often high-dimensional, noisy, and complex. Dimensionality reduction techniques help simplify this complexity by projecting data into a lower-dimensional space that preserves meaningful structure. These representations support visualization, clustering, and trajectory inference. Importantly, biological data often defies simple binary classification. Cell states may exist along a continuum, clusters may overlap, and some samples may lie in transitional states. Traditional clustering methods may miss these subtleties, motivating the use of more flexible, geometry-aware approaches.

This tutorial introduces a set of complementary methods:

ICA (Independent Component Analysis) isolates statistically independent signals https://payamemami.com/ica_basics/
SOM (Self-Organizing Maps) organizes samples based on topological similarity https://payamemami.com/self_orginizating_maps_basics/
t-SNE and UMAP focus on preserving local neighborhoods for visualizing fine structure and clustering.
Diffusion Maps model global transitions using random walks, capturing continuous biological trajectories.

Comparison of Dimensionality Reduction Methods

Method	Key Concept	When to Use	Common Applications	Limitations
ICA	Decomposes data into statistically independent components	To identify latent signals or sources driving variation	Gene module analysis, signal separation	Assumes independence; sensitive to noise and scaling
SOM	Maps high-dimensional data onto a topologically ordered grid	When you want structured clustering and visualization on a 2D map	Expression patterns, cohort stratification	Grid size needs tuning; interpretation can be subjective
t-SNE	Preserves local similarities via stochastic neighbor embedding	For visualizing clusters in 2D or 3D	Cell type discovery, quality control	Poor global structure; sensitive to seed/perplexity; not suitable for downstream modeling
UMAP	Graph-based manifold learning balancing local and global structure	When you want fast, structure-preserving embeddings	Single-cell analysis, cohort comparison	Sensitive to parameters; may distort distances
Diffusion Maps	Uses a random walk process to capture manifold geometry	To model continuous transitions, trajectories, or diffusion-like dynamics	Cell state transitions, lineage inference	Slower on large datasets; interpretation of components may not be intuitive