Similarity Network Fusion - SNF

Sergiu Netotea, PhD, NBIS, Chalmers

Similarity networks

Network models are a very complex representation of data:

Common network (graph) representation

Typical dataset:

Via network modelling, all instances are turned into feature relationships

Via network fusion, the feature relationships (links) are described based on multiple datasets.

NF basics

How can networks be fused?

Simple rule: average edge values by summing up adjacency matrices!

Pitfalls:

Complex rule: take in consideration the infomation difussivity in each network when fusing them!

(Perceived) advantages of network fusion

Very useful when:

It is a generative model!

SNF: Similarity Network Fusion

Wang, Bo et al. “Similarity network fusion for aggregating data types on a genomic scale.” Nature methods vol. 11,3 (2014): 333-7.

https://doi.org/10.1038/nmeth.2810

Similarity networks

similarity

SNF Method

Similarity distance:

SNF Method - the similarity matrix

$$ \mathbf{P}(i,j) = \left\{\begin{array}{rr} \frac{\mathbf{W}_(i,j)} {2 \sum_{k\neq i}^{} \mathbf{W}_(i,k)} ,& j \neq i \\ 1/2 ,& j = i \end{array}\right. $$

SNF Method - the kernel matrix

$$ \mathbf{S}(i,j) = \left\{\begin{array}{rr} \frac{\mathbf{W}_(i,j)} {\sum_{k\in N_{i}}^{}\mathbf{W}_(i,k)} ,& j \in N_{i} \\ 0 ,& \text{otherwise} \end{array}\right. $$

SNF Method - Inductive network fusion

$$ \mathbf{P}^{(v)} = \mathbf{S}^{(v)} \times \frac{\sum_{k\neq v}^{}\mathbf{P}^{(k)}}{m-1} \times (\mathbf{S}^{(v)})^{T}, v = 1, 2, ..., m $$

Obs:

Different usage of the fused matrix:

Post-fusion analysis - Spectral clustering

The fitted model is a similarity network (an affinity matrix). Thus to cluster one must use graph algorithms such as spectral clustering. Other popular graph clustering methods are: K-neighbors clustering, MCL clustering

Spectral clustering:

SNF usage examples

(latest trents) MoGCN, graph neural networks

Consensus clustering

SNF drawbacks:

Details:

Integrative SNF-CC example:

Stages:

Weighted similarity network fusion

Links:

WSNF - Method (1)

  1. Compute PageRank based ranking of features. Important features are ranked based on the in-number of TFs.
    • Build the regulatory network where the nodes represent the features, i.e. the microRNAs (miRNAs), transcription factors (TFs) and messenger RNAs (mRNAs) and the edges indicate the interactions between the features.
    • The interactions are retrieved from various interatomic databases.

WSNF - Method (2)

  1. Integrate feature ranking with feature variation. Gene expression based.
    • Use knn based kernels and the expression data of the miRNAs, TFs and mRNAs to calculate the weight of the features, representing the level of importance of the features.
  2. Weighted SNF
    • The feature weight is then integrated into a network fusion approach to cluster the samples (patients) and thus to identify cancer subtypes.
    • $D(s_i, s_j) = \sqrt{\sum_{m=1}^p{W(f_m)*(f_m^{s_i}-f_m^{s_j})^2}}$

ANF - Affinity network fusion

Obs. While the SNF fitting process might look long, the patient matrix is usually a small network (most clinical datasets are in the order of tens - hundreds)!

ANF application - Semi-supervised learning on patient affinity networks

https://arxiv.org/abs/1805.09673

Subsequent SNF studies:

Main ideas:

SKF - Similarity Kernel Fusion

The Kernel trick

$P_l^{t+1}=\alpha(S_l \times \frac{\sum_{r \neq l}P_r^t}{n} \times S_l^T) + (1-\alpha) \frac{\sum_{r \neq l}P_r^t}{n}$

Software, papers, further reading

Papers:

Software: