# 1 Main exercises

The following data will be used in the used for the exercises in this course. The data comes from a mock RNA sequencing data with 12 samples that are from cell-lines. As in any normal RNAseq analysis the counts of genes were normalized using different methods such as VST and CPM after filtering out the genes that were of very low expression. We would use these data for visualizations.

You can download all the files together here. Make a directory called data and unzip in that directory!

If workshop_on_plotting_in_R is the current working directory, the directory tree should look like this:

• ggplot_geneco_course
• data
• arch_newick.txt
• counts_deseq2.txt
• counts_filtered.txt
• counts_raw.txt
• counts_vst.txt
• human_biomaRt_annotation.csv
• Time_t24_vs_t0.txt
• Time_t2_vs_t0.txt
• Time_t6_vs_t0.txt
• tree_env.tsv
• tree_hmap.tsv
• shiny_app_data.csv

You can find the information related to the important files below:

## 1.3 DE genes

Below are the lists of differentially expressed genes between different time points

# 2 Optional exercises

## 2.1 Phylogenetic trees

The data required for the phylogenetic trees are already part of the main data.zip file, as shown in the directory tree before. The files needed for this exercise are described below:

• A phylogentic tree based on a group of archaeal genomes in newick format: Archaea tree
• Some metadata info regarding the environments that these archaea could be found: Environment info
• An expression matrix to show heatmap and phylogenetic tree together: for heatmap

## 2.2 Map-data with ggmap

The data required for this part of the exercise can be downloaded from here

• Population statistics of the different countries in 2020
• A small data frame of points from the Sisquoc River
• GPS readings from a personal run