Output

For this course, I have used one of my very simple tree from a few different organisms within the Archaea superkingdom. More specifically, it is the concatenated ribosomal protein tree within archaea, with representatives from the different Phyla: Asgard archaea (Thorarchaeota, Odinarchaeota, Lokiarchaeota and Heimdallarchaeota), TACK, Euryarchaeota and DPANN. The maximum likelihood phylogeny was reconstructed in IQ-TREE with the LG+F+I+G4 model. The alignment was trimmed to 6,732 positions with the BMGE (Block Mapping and Gathering with Entropy) tool.

1 ggtree

ggtree is an R package that extends ggplot2 for visualizating and annotating phylogenetic trees with their covariates and other associated data. ggtree in combination with treeio supports several file formats, including:

  • read.tree for reading Newick files.
  • read.phylip for reading Phylip files.
  • read.jplace for reading Jplace files.
  • read.nhx for reading NHX files.
  • read.beast for parsing output of BEAST
  • read.codeml for parsing output of CODEML (rst and mlc files)
  • read.codeml_mlc for parsing mlc file (output of CODEML)
  • read.hyphy for parsing output of HYPHY
  • read.jplace for parsing jplace file including output from EPA and pplacer
  • read.nhx for parsing NHX file including output from PHYLODOG and RevBayes
  • read.paml_rst for parsing rst file (output of BASEML and CODEML)
  • read.r8s for parsing output of r8s
  • read.raxml for parsing output of RAxML

2 Reading treefile

R

output

## 
## Phylogenetic tree with 25 tips and 24 internal nodes.
## 
## Tip labels:
##   DPANN_Pace_RBG_13_36_9, DPANN_Woese_UBA94, EURY_Thermoplasma_volcanium_GSS1, Eury_Halobaculum_gomorrense, Eury_Methanothermococcus_okinawensis, TACK_Bathy_SMTZ80, ...
## Node labels:
##   100, 10.9, 100, 100, 72, 35.9, ...
## 
## Rooted; includes branch lengths.

Just like with ggplot2 we created a basic canvas with ggplot(...) and added layers with +geom_???(), we can do the same here. The ggtree package gives us a geom_tree() function. Because ggtree is built on top of ggplot2, you get ggplot2’s default gray theme with white lines. You can override this with a theme from the ggtree package.

Because you’ll almost always want to add a tree geom and remove the default background and axes, the ggtree() function is essentially a shortcut for ggplot(...) + geom_tree() + theme_tree().

One can also customize how the tree looks like, just as in ggplot

You can also use different orientations to show the tree.

3 Geoms and annotation

Just like in ggplot() you can add layers to the tree as geoms.

Before we can go further into annotating the tree, we need to understand how ggtree is handling the tree structure internally. Some of the functions in ggtree for annotating clades need a parameter specifying the internal node number. To get the internal node number, user can use geom_text to display it, where the label is an aesthetic mapping to the “node variable” stored inside the tree object (think of this like the continent variable inside the gapminder object).

Another way to get the internal node number is using the function MRCA() which stands for Most Recent Common Ancestor. From the tree above, if you would like to get the number of the particular node that is common to all the “Lokiarchaea”. You can do so like below:

3.3 Group clades and OTUs

You can group the different tips using the groupOTU function or you can group the tips based on a node using groupClade function. These groups you can then further use it for other geoms and so on.

Let us say, we want to show the Asgrad archaea separately from the other archaea in this tree. One way to this would be:

3.4 Showing bootstrap values

If you notice the tree object, it actually contains node.label that are bootstrap values.

R

output

## 
## Phylogenetic tree with 25 tips and 24 internal nodes.
## 
## Tip labels:
##   DPANN_Pace_RBG_13_36_9, DPANN_Woese_UBA94, EURY_Thermoplasma_volcanium_GSS1, Eury_Halobaculum_gomorrense, Eury_Methanothermococcus_okinawensis, TACK_Bathy_SMTZ80, ...
## Node labels:
##   100, 10.9, 100, 100, 72, 35.9, ...
## 
## Rooted; includes branch lengths.

There are different ways we could show this values on the tree. For example,

Now, let us say that you only want to show the values that are above 80, then you would have to do some trick here, like:

Note   Notice that the node.label are in string, so must remember to use as.numeric() option be able to filter them to show them.

Note   If the tree does not have bootstrap values with in the treefile, one can add them separately as a data.frame using %<+% function, as shown in the Adding data to tree section below.

3.5 Subsetting trees

It is possible to specific tips from the tree using drop.tip() fuction. Let us say, we want to remove Thor_SMTZ45 and Baja_Loki1, becuase of the reason that their genomes are not close to complete. We can do that by:

One can also subset a tree, by using the internal node number as shown in the exercises earlier. We can do this by using the tree_subset() function:

Note   The branch lengths and all the other information from the tree are maintained in the new object, when remove tips or subset a tree.

4 Adding data to tree

In this part, let us try to see what kind of ways one could add metadata to a tree.

4.2 Images

One can use phylopic database, which is part of the ggtree package to use many silhouette images of organisms. Here, we can see how one could add some of those images to a tree.

One can add specific images to the specific nodes as well. For this, we would include the phylopic information as a dataframe and which specific node number, should these images be matched. Then we could add this information to the ggtree object using %<+% function.

Note   Notice that for Odin, we had to use geom_tiplab() as it is a tip and not a node!

There are more fun images in phylopic that one can use:

5 Session info

R
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggimage_0.2.8   treeio_1.10.0   ggtree_2.0.4    forcats_0.5.0  
##  [5] stringr_1.4.0   purrr_0.3.4     readr_1.3.1     tidyr_1.1.2    
##  [9] tibble_3.0.4    tidyverse_1.3.0 reshape2_1.4.4  ggplot2_3.3.2  
## [13] dplyr_1.0.2     captioner_2.2.3 bookdown_0.20   knitr_1.29     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.5          lubridate_1.7.9     ape_5.4-1          
##  [4] lattice_0.20-40     assertthat_0.2.1    digest_0.6.27      
##  [7] R6_2.5.0            cellranger_1.1.0    plyr_1.8.6         
## [10] backports_1.1.10    reprex_0.3.0        evaluate_0.14      
## [13] httr_1.4.2          pillar_1.4.6        rlang_0.4.8        
## [16] curl_4.3            lazyeval_0.2.2      readxl_1.3.1       
## [19] rstudioapi_0.11     blob_1.2.1          magick_2.5.0       
## [22] rmarkdown_2.3       labeling_0.4.2      munsell_0.5.0      
## [25] broom_0.7.0         compiler_3.6.3      modelr_0.1.8       
## [28] xfun_0.18           gridGraphics_0.5-0  pkgconfig_2.0.3    
## [31] htmltools_0.5.0     tidyselect_1.1.0    fansi_0.4.1        
## [34] crayon_1.3.4        dbplyr_1.4.4        withr_2.3.0        
## [37] grid_3.6.3          nlme_3.1-144        jsonlite_1.7.1     
## [40] gtable_0.3.0        lifecycle_0.2.0     DBI_1.1.0          
## [43] magrittr_1.5        scales_1.1.1        tidytree_0.3.3     
## [46] cli_2.1.0           stringi_1.5.3       farver_2.0.3       
## [49] fs_1.5.0            xml2_1.3.2          ellipsis_0.3.1     
## [52] rvcheck_0.1.8       generics_0.0.2      vctrs_0.3.4        
## [55] tools_3.6.3         ggplotify_0.0.5     glue_1.4.2         
## [58] hms_0.5.3           parallel_3.6.3      yaml_2.2.1         
## [61] colorspace_1.4-1    BiocManager_1.30.10 rvest_0.3.6        
## [64] haven_2.3.1

End of document