NMF lab

Task (45 minutes)

Important

Data: MOFA's CLL dataset

  1. Load and prepare data, then perform NMR on the joint representation of the methylation and mRNA datasets.
  2. Assume that there are two cancer subtypes and cluster them :)
  3. Find the driving features and verify their functionality.

Code:

conda create -n iocourse conda activate iocourse conda install jupyterlab pandas numpy scikit-learn pip install snfpy

Load the datasets:

To ensure non-negativity, add the negative modality data as new features. Just some ugly code, not important. :)

Apply the frobenius norms to normalize the datasets, then concatenate the data.

Unfortunately the dataset is not column matched, so we got to do it ourselves.

Fiting an NMF model

Clustering

NMF analysis

What samples drive our first component (cluster)?

Similarly, what transcripts ID and methylation probes are driving the signal in the same component?

Let's get the first ten genes and the first ten probes...

Does is look simpler than MOFA?

well...

Now, for finer points, expand this study:

Shameless self promotion: