1 Main exercises

The following data will be used in the used for the exercises in this course. The data comes from a mock RNA sequencing data with 12 samples that are from cell-lines. As in any normal RNAseq analysis the counts of genes were normalized using different methods such as VST and CPM after filtering out the genes that were of very low expression. We would use these data for visulaizations.

Download all of these files into your current working directory

You can download all the files together here. Make a directory called data and unzip in that directory!

If ggplot_geneco_course is the current working directory, the directory tree should look like this:

ggplot_geneco_course
- data
  - arch_newick.txt
  - counts_deseq2.txt
  - counts_filtered.txt
  - counts_raw.txt
  - counts_vst.txt
  - human_biomaRt_annotation.csv
  - metadata_raw.csv
  - Time_t24_vs_t0.txt
  - Time_t2_vs_t0.txt
  - Time_t6_vs_t0.txt
  - tree_env.tsv
  - tree_hmap.tsv

You can find the information related to the important files below:

1.1 Counts table

Table with gene counts after filtering: Filtered Counts
Table with gene counts normalized with VST: VST counts
Table with gene counts normalized with DESeq2: DESeq2 counts

1.2 Metadata

Metadata of the samples: Sample Metadata
Metadata of the genes with their functions: Gene Annotation

1.3 DE genes

Below are the lists of differentially expressed genes between different time points

2 Optional exercises

2.1 Phylogenetic trees

The data required for the phylogenetic trees are already part of the main data.zip file, as shown in the directory tree before. The files needed for this exercise are described below:

A phylogentic tree based on a group of archaeal genomes in newick format: Archaea tree
Some metadata info regarding the environments that these archaea could be found: Environment info
An expression matrix to show heatmap and phylogenetic tree together: for heatmap

2.2 Map-data with ggmap

The data required for this part of the exercise can be downloaded from here

Population statistics of the different countries in 2020
A small data frame of points from the Sisquoc River
GPS readings from a personal run

End of document.

Downloads

Workshop on ggplot

Lokesh Mano • 28-Oct-2020