Omics Integration course 2020

Rui Benfeitas, Scilifelab, NBIS National Bioinformatics Infrastructure Sweden

rui.benfeitas@scilifelab.se

Abstract
In this notebook we will explore how to generate and analyse a multi-omic network comprising metabolites quantifications and gene expression. We will compare these networks against randomly generated networks, and compute different network metrics. At the end we will also perform a community analysis and functional characterization at the gene level.


Contents

Biological network topology analysis

Objective
In this notebook you will learn how to build and analyse a network built and analysed from a gene-metabolite association analysis. Other mixed networks may also be similarly analyzed, differring only in whether and how you can apply the final functional analysis.

Data
As a test case we will be using the file met_genes.tsv which contains abundances for 125 metabolites and 1992 genes, for 24 samples.

Software
This notebook relies on python's igraph for most of the analyses. Most, if not all, functions used here can also be applied with R's igraph. Other packages exist for network analysis including networkx and graph-tool. Snap.py is also a good alternative for large networks.

We will build our network through an association analysis, but there are other methods to do this including Graphical Lasso or linear SVR.

Data preparation

You will use a gene expression dataset (RNAseq, expressed as TPM) from a disease group with 24 samples to keep analysis memory and time requirements reasonable for this lesson. It is assumed that all batch effects or other possible technical artifacts are not present, and that all data is ready for analysis. However, there are several important considerations in preprocessing your data:

This will depend on the type of that that you have and what you want to do with it, and will severely affect downstream results. It is thus important to carefully think about this.

No duplicated features are present.

A very quick view shows that several gene clusters are found, including two major groups. However, the analysis below does not perform any statistical filtering.

Association analysis

Our initial network analysis will be performed on the association analysis using Spearman correlations. Because this network has a big chance of producing false positives we will consider Bonferroni correction to control for familywise error, as well as FDR.

A very quick view shows that several gene clusters are found, including two major groups. However, the analysis below does not perform any statistical filtering.

We will perform a gene-gene, gene-metabolite, and metabolite-metabolite association analysis by computing pairwise Spearman correlations. Choosing other non-parametric (Kendall or Spearman) vs parametric (Pearson) methods depends on your data. Here, because we have a small sample size, and want to save on computational time we choose Spearman.

The following takes a few minutes to run.