NMF - Non-Negative Matrix Factorization

by Sergiu Netotea, PhD, NBIS, Chalmers

Further read:

Observations:

NMF - Non-Negative Matrix Factorization

NMF - some intuitive examples from multi-omics:

V matrix (values matrix) (weights, scores) W matrix (hidden, loadings) H matrix
expression values (gene x samples) genes x factors (metagenes) factors (metagenes) x samples
protein counts (proteins x samples) proteins x factors (domains) factors (domains) x samples
multiomics observations (genes, proteins, etc x samples) (genes, proteins, etc) x factors (multiomic features) factors (multiomic features) x samples
multiple datasets (genes x samples x batches) genes x factors (multi_batch domains) factors (multi_batch domains) x samples

NMF - in general contexts:

V matrix (values matrix) (weights, scores) W matrix (hidden, loadings) H matrix
recommender systems (item x user) items x factors (preferences) factors (preferences) x users
collaborative filtering (user x user connections) user x factors (communities) factors (communities) x users
language processing (word distribution x document) word x factors (topics) factors (topics) x documents
facial recognition (faces x labels) faces x factors (facial features) factors (facial features) x labels
microscopy pictures (picture x samples) pictures x factors (image segments) factors x samples
spectrometry (spectra x sample) spectra x factors (component molecules) factors (component molecules) x samples

Solving NMF

Alternating non-negative least squares (ANLS)

NMF - multi omics usage observations

Comparison to PCA and autoencoders

Further read:

Toy dataset

Intuitively we can see that the users (samples) are conected to their items (genes) via a hidden scheme, that could simplify this table. The elements of such a hidden scheme we call hidden (latent) factors. Here is a possible example:

Can we figure out the hidden factors? We can do this in one of two ways, if we know the real afflictions, or as it is in our toy model, we only know the effect of our omics features. Thus, we have to look into W (weights or factors matrix)

Example findings:

Hipothesis hunting: W x H is an approximation of V, so by transforming the dataset based on the NMF model we can learn some new things.

Why is NMF so important?

Reference: For the toy example I drew inspiration from the following Medium article:

NMF usage observations and applications

NMF has many solvers, that are ultra efficient:

Multiple normalities

Should data be normalized before coercing it into a matrix factorization model?

Missingness and regularization

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html

SNMF - Sparse NMF

jNMF - the MOFA of NMFs

iNMF - for active learning

Deep NMF - a novel paradigm, doing MF as part of a deep NN

- Deep architecture: CNN, with backpropagation, each NMF layer performs a hierarchical decomposition

NMF for single cell experiments

Matrix Factorization - beyond NMF

Software and bibliography:

Rank estimation of NMF:

Other integrative NMF:

Fast Tensorial Calculus:

Non-negative CP Decomposition (NTF)