Sergiu Netotea, PhD, NBIS, Chalmers
Network models are a very complex representation of data:
Typical dataset:
Via network modelling, all instances are turned into feature relationships
Via network fusion, the feature relationships (links) are described based on multiple datasets.
Simple rule: average edge values by summing up adjacency matrices!
Pitfalls:
Complex rule: take in consideration the infomation difussivity in each network when fusing them!
Very useful when:
It is a generative model!
Wang, Bo et al. “Similarity network fusion for aggregating data types on a genomic scale.” Nature methods vol. 11,3 (2014): 333-7.
https://doi.org/10.1038/nmeth.2810
Similarity distance:
Obs:
t
iterations, or when the matrices $\mathbf{P}^{(v)}, v = 1, 2, ..., m$ converge.Different usage of the fused matrix:
The fitted model is a similarity network (an affinity matrix). Thus to cluster one must use graph algorithms such as spectral clustering. Other popular graph clustering methods are: K-neighbors clustering, MCL clustering
Spectral clustering:
SNF drawbacks:
Details:
Stages:
Links:
Obs. While the SNF fitting process might look long, the patient matrix is usually a small network (most clinical datasets are in the order of tens - hundreds)!
https://arxiv.org/abs/1805.09673
Main ideas:
Given the expression data of genome elements, we first extract multiple expression features for each regulatory element based on the heterogeneous biological networks. Based on the extracted feature matrices of samples, we use a matrix correlation method, RV2, to predict the similarities between samples in each expression data-view, and then fuse the similarity information in samples from all considering data-views according to different integration weights. Finally, we cluster patient samples into different cancer subtypes based on the predicted integrative similarity network between samples.
$P_l^{t+1}=\alpha(S_l \times \frac{\sum_{r \neq l}P_r^t}{n} \times S_l^T) + (1-\alpha) \frac{\sum_{r \neq l}P_r^t}{n}$