This page contains links to different tutorials that are used in this course. The tutorials are well documented and should be easy to follow.

Input code blocks are displayed like shown below. The code language is displayed above the block. Shell scripts (SH) are to be executed in the linux terminal such as bash. R scripts are to be run in R either through the terminal, RGui or RStudio.

sh

Note   Tip   Discuss   Task


This topic covers retrieving data needed for all the exercises in the entire course can be found in the following link.

Downloading data

Some pre-course R warm-up below, if you fancy that:

R warmup

1 Introduction

Most of the analyses is carried out in R and it will be useful to learn some basics in R that are very much necessary for this course. Even if you have learnt these things already, it is good to freshen up your memory.

Introduction to R

2 Main lab

2.1 Data

In most of the exercises, we will use RNA-seq data (Illumina short reads) from the human A431 cell line. It is an epidermoid carcinoma cell line which is often used to study cancer and the cell cycle, and as a sort of positive control of epidermal growth factor receptor (EGFR) expression. A431 cells express very high levels of EGFR, in contrast to normal human fibroblasts.

The A431 cells were treated with gefinitib, which is an EGFR inhibitor and is used (under the trade name Iressa) as a drug to treat cancers with mutated and overactive EGFR. In the experiment, RNA was extracted at four time points: before the gefinitib treatment (t=0), and two, six and twenty-four hours after treatment (t=2, t=6, t=24, respectively), and sequenced using an Illumina HiSeq instrument in triplicates (thus there are 3x4=12 samples).

This data is part of the transcriptomics course that is also given by NBIS. We will use some of the counts table that was generated in the course after different transformations like manual filtering for low counts, VST and DESeq2. You don’t have to know what these exactly mean to do these exercises :) These are basically the same data at different stages of the transcriptomics analysis.

2.2 ggplot basics 1

Below is the link for the basic plotting exercise. there will be some basic plotting exercises first using R base graphics and comparing them to the grid graphics using ggplot2. Followed by practicing the first basics in ggplot: geoms, colors and aesthetics.

Geoms, colors and aesthetics

2.3 ggplot basics 2

Below is the link for the second exercise where you will look into facets, barplots and errorbars.

Facets, barplots and errorbars

2.4 ggplot basics 3

Below is the link for the third exercise where you will look into facets, barplots and errorbars.

Axes, labels, legends and themes

2.5 Combining plots

In this part of the lab, we will look into: how one can combine different plots that we have made using different tools. Also to look into some of the advantages of cowplot and ggpubr.

Combining plots

2.6 PCA and Heatmaps

Here, we look into building PCA plots with ggplot. We will also look into making heatmaps with both pheatmap and geom_tile in ggplot.

PCA and Heatmap

3 Optional exercises

Below are specific exercises for people who are interested in specific topics.

3.1 Phylogenetic trees

Below are some exercises in working with phylogenetic trees in R using mainly a package called ggtree.

Phylogenetic trees

3.2 Map-data using ggmap

Below is if you want publication-grade phylogenetic trees.

Map-data using ggmap

4 Solutions to exercises:

Here I have compiled all the solutions for the different exercises in each of the sections:

Solutions

5 Extra Tutorial for ggplot

In the following link, you can find several different ways to use ggplot and obtain incredible visualizations of the data. Depending on the kind of data you have and the kind of visualization you would like to see, you can follow it on the table contents in this following page:

Top 50 ggplot2 Visualizations

End of document