This tutorial will cover R API for cytoscape through RCy3 package. People who are interested in Cytoscape GUI with can follow Lars Juhl Jensen’s labs tutorial(https://jensenlab.org/training/stringapp/).

1 Installation and testing the setup

If the packages required for this session are not available, please create the environment using the “renv.lock” file.¨

renv::restore(lockfile = "renv.lock")

suppressMessages(library(dplyr))
suppressMessages(library(RCy3))
suppressMessages(library(readxl))
suppressMessages(library(tidyverse))
suppressMessages(library(readxl))

Start the cytoscape app before running any of the follwoing code.

cytoscapePing()
cytoscapeVersionInfo()

##       apiVersion cytoscapeVersion 
##             "v1"          "3.8.0"

Now close the cytoscape and install the string database app using R studio.

installApp("stringapp")

1.1 Installation of cytoscape apps using R

To install cytoscape apps one can use the Cytoscape GUI through app manager or can install directly from R using the following code. The following step wont be necessary for this course.

installation_responses <- c()

#list of app to install
cyto_app_toinstall <- c("clustermaker2", "enrichmentmap", "autoannotate", "wordcloud", "stringapp", "aMatReader", "yFiles Layout Algorithms", "enhancedGraphics")

cytoscape_version <- unlist(strsplit( cytoscapeVersionInfo()['cytoscapeVersion'],split = "\\."))
if(length(cytoscape_version) == 3 && as.numeric(cytoscape_version[1]>=3) 
   && as.numeric(cytoscape_version[2]>=7)){
  for(i in 1:length(cyto_app_toinstall)){
    #check to see if the app is installed.  Only install it if it hasn't been installed
    if(!grep(commandsGET(paste("apps status app=\"", cyto_app_toinstall[1],"\"", sep="")), 
             pattern = "status: Installed")){
      installation_response <-commandsGET(paste("apps install app=\"", 
                                                cyto_app_toinstall[i],"\"", sep=""))
      installation_responses <- c(installation_responses,installation_response)
    } else{
      installation_responses <- c(installation_responses,"already installed")
    }
  }
  installation_summary <- data.frame(name = cyto_app_toinstall, 
                                     status = installation_responses)
  
  knitr::kable(list(installation_summary),
  booktabs = TRUE, caption = 'A Summary of automated app installation'
)
}

1.2 Intuition behind using RCy3 to analyze networks with cytoscape.

Cytoscape(www.cytoscape.org) is probably the most popular applications for network analysis and visualization with GUI. In this session, we will learn about new capabilities to integrate Cytoscape into programmatic workflows and pipelines using R. Apart from programmatic workflow, the access through R allows direct access to network analyses as being part of an ongoing analyses instead of a separate instance. This also makes analyses steps more explicit hence more reproducible. There are community provided scripts available for various kinds of analyses at (https://github.com/cytoscape/cytoscape-automation/tree/master/for-scripters/R/notebooks). If you are interested in the type of analyses that suits your project more, you may find some scripts related to that here. This tutorial is just a quick hands on introduction to using Cytoscape using from R. Cytoscape has amazing amount of functionality and only a fraction of it can be covered here.

We have already been introduced to network biology themes and concepts, therefor we will translate these into Cytoscape terms for practical applications. This tutorial will be a hands-on session working towards accessing and controlling Cytoscape from R to perform a network analysis. We will go through the following themes that are usually required in network visualization and analyses.

retrieve networks for Genes/ proteins of interest
layout and visually style the resulting networks
import external data and map them onto a network
filter networks and customize styling
perform co-expression analyses and create your own co-expression network
visualize the results and map attributes to user created networks

2 Data preparation

We will be using data from the Charge consortium in which they have analyzed quite large cohort with association of methylation and gene expression with Age. We will use some of the results for visualization in Cytoscape. In addition there is a list of proteins and coefficients from a proteomics data-set provided in the /Data/AgeingTopProteins.txt. All these data aim to identify markers associated with age. We can try and find overlaps in these data-sets using network analyses in cytoscape.

#setwd("/Users/cob-aaf/Documents/GitHub/session_visualization/")
AgeingExpMeth <- read_excel("./Data/AgeExpMethSig.xlsx",   skip = 0) ## Provide the path to the file with data

Age_prot <- read_table2("./Data/AgeingTopProteins.txt")

Age_exp <- AgeingExpMeth %>%  arrange(`ExpAgeP-value`) %>% slice_head(n=100)
Age_meth <- AgeingExpMeth %>% arrange(`MethAgeP-value`) %>% slice_head(n=100)

rownames(Age_exp) <- Age_exp$Gene

3 Get help

To get an overview of what commands and functions are available through “RCy3”, please have a look at the help pages. More details with examples are also available through package vignettes.

help(package=RCy3)
browseVignettes("RCy3")

4 Retrieve networks for genes/proteins of interest

To retrieve a network from the StringDB based on your own genes/proteins of interest from your own data, you need to construct a query command as following.

AgeingExpe_string_interaction_cmd <- paste('string protein query taxonID=9606 cutoff=0.9 query=', paste(Age_exp$Gene, collapse=","),'"',sep="")
commandsGET(AgeingExpe_string_interaction_cmd )

## [1] "Loaded network 'STRING network' with 109 nodes and 90 edges"

To get a better understanding of how StringDB can be constructed, read the commandsGET help pages.

?commandsGET

One can set a name to the retrieved network, as cytoscape gives your network a generic name. And if you have retrieved several networks, it may get confusing after a while.

renameNetwork("AgeingExpeNet")

AgeingMeth_string_interaction_cmd <- paste('string protein query taxonID=9606 cutoff=0.9 query=', paste(Age_meth$Gene, collapse=","),'"',sep="")
commandsGET(AgeingMeth_string_interaction_cmd)
renameNetwork("AgeingMethNet")

## [1] "Loaded network 'STRING network - 1' with 109 nodes and 97 edges"

AgeingProt_string_interaction_cmd <- paste('string protein query taxonID=9606 limit=150 cutoff=0.9 query=', paste(Age_prot$protein, collapse=","),'"',sep="")
test <-commandsGET(AgeingProt_string_interaction_cmd)
renameNetwork("AgeingProtNet")

To get an overview of what networks are present in cytoscape workspace use following:

getNetworkList()

## [1] "AgeingMethNet" "AgeingExpeNet" "AgeingProtNet"

To work with a respective network, we need to set the environment we wish to work with.

setCurrentNetwork(network="AgeingExpeNet")

Save the original network to a file.

path <- file.path( "Data","ExpnetworkOriginal.png")
# initial_string_network_png_file_name <- file.path(getwd(), "/Path/to/your/network.png")

exportImage(path, type = "png", resolution = 300)

Cytoscape will hang waiting for user response if the network file already exists. You have to remove it first

5 Do on your own

Can you save the “AgeingMethNet” and “AgeingProtNet” to the respective files?
Use your desired node attributes among the cytoscape network columns to improve visualization.

6 Change network layout and visually style the networks

Now that we have a few networks loaded in cytoscape, let us work with those based on our interest e.g., change the layout of the network and save to a new file.

layouts <- getLayoutNames() ## Access the available lyouts in cytoscape

layoutNetwork(layouts[4]) ## Change the layout to circular

#exportImage(path, type = "png", resolution = 300) ## Cytoscape will hang as you are trying to overwite an existing file

path <- file.path(getwd(), "/Data/","ExpnetworkOriginalCircular.png") # update the path
exportImage("./Data/ExpnetworkOriginalCircular1.png", type = "png", resolution = 300) ## Cytoscape will hang as you are trying to overwite an existing file

##                                                                                                              file 
## "/Users/cob-aaf/Documents/GitHub/omicsintELexir2021/session_visualization/./Data/ExpnetworkOriginalCircular1.png"

Get available parameters for a specific layout.

getLayoutPropertyNames(layout.name='force-directed')

## [1] "Available arguments for 'layout force-directed':"
##  [1] "defaultEdgeWeight"        "defaultNodeMass"         
##  [3] "defaultSpringCoefficient" "defaultSpringLength"     
##  [5] "edgeAttribute"            "isDeterministic"         
##  [7] "maxWeightCutoff"          "minWeightCutoff"         
##  [9] "network"                  "nodeAttribute"           
## [11] "nodeList"                 "numIterations"           
## [13] "singlePartition"          "type"

Apply the layout by changing parameters again

layoutNetwork('force-directed defaultSpringCoefficient=0.0000008 defaultSpringLength=70')

Overlay our expression data on the String network. To do this we will be using the loadTableData function from RCy3. It is important to make sure that that your identifiers types match up. You can check what is used by String by pulling in the column names of the node attribute table.

getTableColumnNames('node')

##  [1] "SUID"                                "shared name"                        
##  [3] "name"                                "selected"                           
##  [5] "stringdb::canonical name"            "display name"                       
##  [7] "stringdb::full name"                 "stringdb::database identifier"      
##  [9] "stringdb::description"               "@id"                                
## [11] "stringdb::namespace"                 "stringdb::node type"                
## [13] "query term"                          "stringdb::sequence"                 
## [15] "stringdb::species"                   "stringdb::STRING style"             
## [17] "stringdb::enhancedLabel Passthrough" "compartment::cytoskeleton"          
## [19] "compartment::cytosol"                "compartment::endoplasmic reticulum" 
## [21] "compartment::endosome"               "compartment::extracellular"         
## [23] "compartment::golgi apparatus"        "compartment::lysosome"              
## [25] "compartment::mitochondrion"          "compartment::nucleus"               
## [27] "compartment::peroxisome"             "compartment::plasma membrane"       
## [29] "stringdb::interactor score"          "stringdb::structures"               
## [31] "target::development level"           "target::family"                     
## [33] "tissue::adrenal gland"               "tissue::blood"                      
## [35] "tissue::bone"                        "tissue::bone marrow"                
## [37] "tissue::eye"                         "tissue::gall bladder"               
## [39] "tissue::heart"                       "tissue::intestine"                  
## [41] "tissue::kidney"                      "tissue::liver"                      
## [43] "tissue::lung"                        "tissue::muscle"                     
## [45] "tissue::nervous system"              "tissue::pancreas"                   
## [47] "tissue::saliva"                      "tissue::skin"                       
## [49] "tissue::spleen"                      "tissue::stomach"                    
## [51] "tissue::thyroid gland"               "tissue::urine"

The column “display name” contains gene names which are also found in our Ageing Expression data-set. To import our expression data we will match our data-set to the “display name” node attribute.

setCurrentNetwork(network="AgeingExpeNet")

?loadTableData

loadTableData(data= as.data.frame(Age_exp),  table.key.column = "display name", data.key.column= "Gene")  #default data.frame key is row.names

## [1] "Success: Data loaded in defaultnode table"

Modify the Visual Style Create your own visual style to visualize your expression data on the String network.

Start with a default style

my.style.name = "AgeingStyle"
defaults.list <- list(NODE_SHAPE="ellipse",
                 NODE_SIZE=60,
                 NODE_FILL_COLOR="#AAAAAA",
                 EDGE_TRANSPARENCY=120)
node.label.map <- mapVisualProperty('node label','display name','p') # p for passthrough; nothing else needed
createVisualStyle(my.style.name, defaults.list, list(node.label.map))
setVisualStyle(style.name=my.style.name)

##                 message 
## "Visual Style applied."

min.Exp.zscore = min(Age_exp$`Z-score`, na.rm=TRUE)
max.Exp.zscore  = max(Age_exp$`Z-score`, na.rm=TRUE)
#mid..Exp.zscore = mean
data.values = c(min.Exp.zscore, mean(min.Exp.zscore,max.Exp.zscore) ,max.Exp.zscore)

Next, we use the RColorBrewer package to help us pick good colors to pair with our data values.

library(RColorBrewer)
display.brewer.all(length(data.values), colorblindFriendly=TRUE, type="div") # div,qual,seq,all


node.colors <- c(rev(brewer.pal(length(data.values), "RdYlBu")))

setNodeColorMapping("Z-score", data.values, node.colors, style.name=my.style.name)

## NULL

Change the size of the node to be correlated with the Age p-value.

setNodeSizeMapping(table.column = 'ExpAgeP-value', 
                   table.column.values = c(min(Age_exp$`ExpAgeP-value`), 
                                           mean(Age_exp$`ExpAgeP-value`), 
                                           max(Age_exp$`ExpAgeP-value`)), 
                   sizes = c(30, 60, 150), mapping.type = "c", style.name = my.style.name)

## NULL

6.1 Creating a degree filter

Every node in a network has a Degree property, which corresponds to the number of edges connecting the node to other nodes, either as a target or source. Filtering based on node degree is a useful way to remove nodes with too few (or too many) connections.

In this example we want to exclude low degree nodes, e.g., those with only 0, 1 or 2 connections:

createDegreeFilter('degree filter', c(0,2), 'IS_NOT_BETWEEN')

## $nodes
##  [1] "9606.ENSP00000398124" "9606.ENSP00000355759" "9606.ENSP00000221801"
##  [4] "9606.ENSP00000278616" "9606.ENSP00000265165" "9606.ENSP00000479618"
##  [7] "9606.ENSP00000368880" "9606.ENSP00000358563" "9606.ENSP00000225916"
## [10] "9606.ENSP00000376436" "9606.ENSP00000263253" "9606.ENSP00000262367"
## [13] "9606.ENSP00000362649" "9606.ENSP00000335153" "9606.ENSP00000332973"
## [16] "9606.ENSP00000269305" "9606.ENSP00000341551" "9606.ENSP00000266970"
## [19] "9606.ENSP00000264279" "9606.ENSP00000344456"
## 
## $edges
## [1] NA

createSubnetwork(subnetwork.name ='Ageing: highly connected nodes')

## network 
##   14927

6.2 Creating a column filter

We could also filter the network based on any of the column data available. Let us filter nodes that come from “target” database belonging to “family” “TF”.

createColumnFilter(filter.name='transcription factors', column="target::family", "TF", "IS", network = "Ageing: highly connected nodes")

## $nodes
## [1] "9606.ENSP00000479618" "9606.ENSP00000368880" "9606.ENSP00000398124"
## [4] "9606.ENSP00000332973" "9606.ENSP00000376436" "9606.ENSP00000269305"
## [7] "9606.ENSP00000265165" "9606.ENSP00000341551"
## 
## $edges
## [1] NA

Now we will select the nodes from the family and color them differently in the network.

nodes <- getSelectedNodes()

setNodeColorBypass(nodes,new.colors='#FF0088')

Here the string matching is based on discrete values but it can be from a continuous variable?

Can you select a few nodes based on your favorite column and see whether you can map some column as node attributes?

6.3 Generate own networks using co-expression analyse

With the following block of code one can perform co-expression analyses. Here you are already provided with correlation matrix that you can use to follow the steps.

This part will not work if you try to copy and paste in the R console, you have to provide the expression matrix of your own data

#scale the data if required
RNASeq_expression <- scale(NASeq_expression, center = TRUE, scale = TRUE)

#rownames(RNASeq_expression) <- RNASeq_expression_matrix$Name
RNAseq_correlation_matrix <- cor(t(RNASeq_expression), method="pearson")

#set the diagonal of matrix to zero - eliminate self-correlation
RNAseq_correlation_matrix[ row(RNAseq_correlation_matrix) == col(RNAseq_correlation_matrix) ] <- 0

# set all correlations that are less than 0.7 to zero
RNAseq_correlation_matrix[which(RNAseq_correlation_matrix<0.60)] <- 0

#get rid of rows and columns that have no correlations with the above thresholds
RNAseq_correlation_matrix <- RNAseq_correlation_matrix[which(rowSums(RNAseq_correlation_matrix) != 0),
                          which(colSums(RNAseq_correlation_matrix) !=0)]

#correlation_filename <- file.path(getwd(),  "Data", "Protein_correlation_matrix.txt")

write.table( RNAseq_correlation_matrix ,  file = correlation_filename, col.names  = TRUE, row.names = FALSE, sep = "\t", quote=FALSE)

Use the CyRest call to access the aMatReader functionality.

correlation_filename <- file.path(getwd(),  "Data", "Protein_correlation_matrix.txt") 

amat_url <- "aMatReader/v1/import"
amat_params = list(files = list(correlation_filename),
                   delimiter = "TAB",
                   undirected = FALSE,
                   ignoreZeros = TRUE,
                   interactionName = "correlated with",
                   rowNames = FALSE
                   )
 
response <- cyrestPOST(operation = amat_url, body = amat_params, base.url = "http://localhost:1234")

current_network_id <- response$data["suid"]

layoutNetwork('cose',network = as.numeric(current_network_id))

renameNetwork(title = "Coexpression_network",
              network = current_network_id)

One can map column attributes as we did in case of string database based on our desired attributes.

#?loadTableData

loadTableData(data= as.data.frame(Age_prot),  table.key.column = "name", data.key.column= "protein")  #default data.frame key is row.names

## [1] "Success: Data loaded in defaultnode table"

createColumnFilter(filter.name='Fold change', column="logFC", 0.01, "GREATER_THAN", network = "Coexpression_network")

nodes <- getSelectedNodes()

setNodeColorBypass(nodes,new.colors='#FF0088')

6.4 Bonus exercises

Create consensus scores to the respective gene and methylation data-sets, create and network based on string db. Map attributes where both gene expression and methylation data agrees most.

## Create attributes for the "Age" dataset

AgeingExpMeth$ExpAgeUp <- AgeingExpMeth$ExpAgeDirection %>% stringi::stri_count( regex = "[+]")
AgeingExpMeth$ExpAgeDown <- AgeingExpMeth$ExpAgeDirection %>% stringi::stri_count( regex = "[-]")

AgeingExpMeth$MethAgeUp <- AgeingExpMeth$MethAgeDirection %>% stringi::stri_count( regex = "[+]")
AgeingExpMeth$MethAgeDown <- AgeingExpMeth$MethAgeDirection %>% stringi::stri_count( regex = "[-]")

AgeingExpMeth$ExpMethUp <- AgeingExpMeth$ExpMethDirection %>% stringi::stri_count( regex = "[+]")
AgeingExpMeth$ExpMethDown <- AgeingExpMeth$ExpMethDirection %>% stringi::stri_count( regex = "[-]")

Cytoscape_R

Ashfaq Ali

21/09/2020