1 Base vs grid graphics

1.1 Base

R is an excellent tool for creating graphs and plots. The graphic capabilities and functions provided by the base R installation is called the base R graphics. Numerous packages exist to extend the functionality of base graphics.

We can try out plotting a few of the common plot types. Let’s start with a scatterplot. First we create a data.frame as this is the most commonly used data object.

dfr <- data.frame(a=sample(1:100,10),b=sample(1:100,10))

Now we have a dataframe with two continuous variables that can be plotted against each other.

plot(dfr$a,dfr$b)

plot

This is probably the simplest and most basic plots. We can modify the x and y axis labels.

plot(dfr$a,dfr$b,xlab="Variable a",ylab="Variable b")

plot

We can change the point to a line.

plot(dfr$a,dfr$b,xlab="Variable a",ylab="Variable b",type="b")

plot

Let’s add a categorical column to our dataframe.

dfr$cat <- rep(c("C1","C2"),each=5)

And then colour the points by category.

# subset data
dfr_c1 <- subset(dfr,dfr$cat == "C1")
dfr_c2 <- subset(dfr,dfr$cat == "C2")

plot(dfr_c1$a,dfr_c1$b,xlab="Variable a",ylab="Variable b",col="red",pch=1)
points(dfr_c2$a,dfr_c2$b,col="blue",pch=2)

legend(x="topright",legend=c("C1","C2"),
       col=c("red","blue"),pch=c(1,2))

plot

Let’s create a barplot.

ldr <- data.frame(a=letters[1:10],b=sample(1:50,10))
barplot(ldr$b,names.arg=ldr$a)

plot

1.2 Grid

Grid graphics have a completely different underlying framework compared to base graphics. Generally, base graphics and grid graphics cannot be plotted together. The most popular grid-graphics based plotting library is ggplot2.

Let’s create the same plot as before using ggplot2. Make sure you have the package installed.

library(ggplot2)

ggplot(dfr)+
  geom_point(mapping = aes(x=a,y=b,colour=cat))+
  labs(x="Variable a",y="Variable b")

plot

It is generally easier and more consistent to create plots using the ggplot2 package compared to the base graphics.

Let’s create a barplot as well.

ggplot(ldr,aes(x=a,y=b))+
  geom_col()

plot

1.3 Saving images

Let’s take a look at saving plots.

Note   This part is just to give you a quick look into how you can save images from Rstudio quickly. The different format of images will be explained in a lecture tomorrow.

1.3.1 Base graphics

The general idea for saving plots is open a graphics device, create the plot and then close the device. We will use png here. Check out ?png for the arguments and other devices.

dfr <- data.frame(a=sample(1:100,10),b=sample(1:100,10))

png(filename="plot-base.png")
plot(dfr$a,dfr$b)
dev.off()

1.3.2 ggplot2

The same idea can be applied to ggplot2, but in a slightly different way. First save the file to a variable, and then export the plot.

p <- ggplot(dfr,aes(a,b)) + geom_point()

png(filename="plot-ggplot-1.png")
print(p)
dev.off()

Tip   ggplot2 also has another easier helper function to export images.

ggsave(filename="plot-ggplot-2.png",plot=p)

2 Ggplot basics

Make sure the library is loaded in your environment.

library(ggplot2)

2.1 Geoms

In the previous section we saw very quickly how to use ggplot. Let’s take a look at it again a bit more carefully. For this let’s first look into a simple data that is available in R. We use the iris data for this to start with.

This dataset has four continuous variables and one categorical variable. It is important to remember about the data type when plotting graphs

data("iris")
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

When we initiate the ggplot object using the data, it just creates a blank plot!

ggplot(iris) 

plot

Now we can specify what we want on the x and y axes using aesthetic mapping. And we specify the geometric using geoms. Note   that the variable names do not have double quotes "" like in base plots.

ggplot(data=iris)+
  geom_point(mapping=aes(x=Petal.Length,y=Petal.Width))

plot

2.1.1 Multiple geoms

Further geoms can be added. For example let’s add a regression line. When multiple geoms with the same aesthetics are used, they can be specified as a common mapping. Note   that the order in which geoms are plotted depends on the order in which the geoms are supplied in the code. In the code below, the points are plotted first and then the regression line.

ggplot(data=iris,mapping=aes(x=Petal.Length,y=Petal.Width))+
  geom_point()+
  geom_smooth(method="lm")

plot

There are many other geoms and you can find most of them here in this cheatsheet

2.1.2 Gene counts data

Let’s also try to use ggplot for a “more common” gene counts dataset. Let’s use the merged_data_long or the gc_long object we created in the earlier session.

ggplot(data = gc_long) +
  geom_boxplot(mapping = aes(x = Sample_Name, y = log10(count +1)))

plot

Note   You can notice that the ggplot sorts the factors or vaiables alpha-numerically, like in the case above with Sample_Name.

Tip   There is a trick that you can use to give the order of variables manually. The example is shown below:

gc_long$Sample_Name <- factor(gc_long$Sample_Name, levels = c("t0_A","t0_B","t0_C","t2_A","t2_B","t2_C","t6_A","t6_B","t6_C","t24_A","t24_B","t24_C"))
ggplot(data = gc_long) +
  geom_boxplot(mapping = aes(x = Sample_Name, y = log10(count + 1)))

plot

2.2 Colors

2.2.1 Iris data

First, if we look at the iris data, we can use the categorical column Species to color the points. The color aesthetic is used by geom_point and geom_smooth. Three different regression lines are now drawn. Notice that a legend is automatically created

ggplot(data=iris,mapping=aes(x=Petal.Length,y=Petal.Width,color=Species))+
  geom_point()+
  geom_smooth(method="lm")

plot

If we wanted to keep a common regression line while keeping the colors for the points, we could specify color aesthetic only for geom_point.

ggplot(data=iris,mapping=aes(x=Petal.Length,y=Petal.Width))+
  geom_point(aes(color=Species))+
  geom_smooth(method="lm")

plot

2.2.2 GC data

Similarly, we can do the same with the gene counts data.

ggplot(data = gc_long) +
  geom_boxplot(mapping = aes(x = Sample_Name, y = log10(count + 1), color = Time))

plot

Tip   We can also use the fill aesthetic to give it a better look.

ggplot(data = gc_long) +
  geom_boxplot(mapping = aes(x = Sample_Name, y = log10(count + 1), fill = Time))

plot

2.2.3 Discrete colors

We can change the default colors by specifying new values inside a scale.

ggplot(data=iris,mapping=aes(x=Petal.Length,y=Petal.Width))+
  geom_point(aes(color=Species))+
  geom_smooth(method="lm")+
  scale_color_manual(values=c("red","blue","green"))

plot

Tip   To specify manual colors, you could specify by their names or their hexadecimal codes. For example, you can choose the colors based on names from an online source like in this cheatsheet or you can use the hexadecimal code and choose it from a source like here. I personally prefer the hexa based options for manual colors.

2.2.4 Continuous colors

We can also map the colors to a continuous variable. This creates a color bar legend item.

ggplot(data=iris,mapping=aes(x=Petal.Length,y=Petal.Width))+
  geom_point(aes(color=Sepal.Width))+
  geom_smooth(method="lm")

plot

Tip   Here, you can also choose different palettes for choosing the right continuous pallet. There are some common packages of palettes that are used very often. RColorBrewer and wesanderson, if you are fan of his choice of colors ;)

library(wesanderson)
ggplot(data=iris,mapping=aes(x=Petal.Length,y=Petal.Width))+
  geom_point(aes(color=Sepal.Width))+
  geom_smooth(method="lm") +
  scale_color_gradientn(colours = wes_palette("Moonrise3"))

plot

Tip   You can also use simple R base color palettes like rainbow() or terrain.colors(). Use ? and look at these functions to see, how to use them.

2.3 Aesthetics

2.3.1 Aesthetic parameter

We can change the size of all points by a fixed amount by specifying size outside the aesthetic parameter.

ggplot(data=iris,mapping=aes(x=Petal.Length,y=Petal.Width))+
  geom_point(aes(color=Species),size=3)+
  geom_smooth(method="lm")

plot

2.3.2 Aesthetic mapping

We can map another variable as size of the points. This is done by specifying size inside the aesthetic mapping. Now the size of the points denote Sepal.Width. A new legend group is created to show this new aesthetic.

ggplot(data=iris,mapping=aes(x=Petal.Length,y=Petal.Width))+
  geom_point(aes(color=Species,size=Sepal.Width))+
  geom_smooth(method="lm")

plot

3 Histogram

Here, as a quick example, we will try to make use of the different combinations of geoms, aes and color in simple plots.

Let’s take a quick look at some of widely used functions like histograms and density plots in ggplot. Intuitively, these can be drawn with geom_histogram() and geom_density(). Using bins and binwidth in geom_histogram(), one can customize the histogram.

ggplot(data=iris,mapping=aes(x=Sepal.Length))+
  geom_histogram()

plot

3.1 Density

Let’s look at the sample plot in density.

ggplot(data=iris,mapping=aes(x=Sepal.Length))+
  geom_density()

plot

The above plot is not very informative, let’s see how the different species contribute:

ggplot(data=iris,mapping=aes(x=Sepal.Length))+
  geom_density(aes(fill = Species), alpha = 0.8)

plot

Note   The alpha option inside geom_density controls the transparency of the plot.

4 Exercise

Task   Make boxplots similar to the one we did here in this exercise for the other three counts (counts_filtered.txt, counts_vst.txt and counts_deseq2.txt).

Tip   You can save the plots themselves as R objects. You will get the plot by just calling those objects. You can then add layers to those objects. An example is shown below:

plot_obj_1 <- ggplot(data=iris,mapping=aes(x=Petal.Length,y=Petal.Width))+
  geom_point(aes(color=Sepal.Width))+
  geom_smooth(method="lm") 
plot_obj_1

plot

plot_obj_2 <- plot_obj_1 +
  scale_color_gradientn(colours = wes_palette("Moonrise3"))
plot_obj_2

plot

This way, you can create different plot objects for the different counts, we will use them in the later exercises.

5 Session info

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] wesanderson_0.3.6      forcats_0.5.1          stringr_1.4.0         
##  [4] purrr_0.3.4            readr_2.1.1            tidyr_1.1.4           
##  [7] tibble_3.1.6           tidyverse_1.3.1        reshape2_1.4.4        
## [10] ggplot2_3.3.5          formattable_0.2.1      kableExtra_1.3.4      
## [13] dplyr_1.0.7            lubridate_1.8.0        yaml_2.2.1            
## [16] fontawesome_0.2.2.9000 captioner_2.2.3        bookdown_0.24         
## [19] knitr_1.37            
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.2        sass_0.4.0        jsonlite_1.7.3    viridisLite_0.4.0
##  [5] splines_4.1.2     modelr_0.1.8      bslib_0.3.1       assertthat_0.2.1 
##  [9] highr_0.9         cellranger_1.1.0  pillar_1.6.4      backports_1.4.1  
## [13] lattice_0.20-45   glue_1.6.0        digest_0.6.29     rvest_1.0.2      
## [17] colorspace_2.0-2  htmltools_0.5.2   Matrix_1.3-4      plyr_1.8.6       
## [21] pkgconfig_2.0.3   broom_0.7.11      haven_2.4.3       scales_1.1.1     
## [25] webshot_0.5.2     svglite_2.0.0     tzdb_0.2.0        mgcv_1.8-38      
## [29] generics_0.1.1    farver_2.1.0      ellipsis_0.3.2    withr_2.4.3      
## [33] cli_3.1.0         magrittr_2.0.1    crayon_1.4.2      readxl_1.3.1     
## [37] evaluate_0.14     fs_1.5.2          fansi_1.0.2       nlme_3.1-153     
## [41] xml2_1.3.3        tools_4.1.2       hms_1.1.1         lifecycle_1.0.1  
## [45] munsell_0.5.0     reprex_2.0.1      compiler_4.1.2    jquerylib_0.1.4  
## [49] systemfonts_1.0.3 rlang_0.4.12      grid_4.1.2        rstudioapi_0.13  
## [53] htmlwidgets_1.5.4 labeling_0.4.2    rmarkdown_2.11    gtable_0.3.0     
## [57] DBI_1.1.2         R6_2.5.1          fastmap_1.1.0     utf8_1.2.2       
## [61] stringi_1.7.6     Rcpp_1.0.8        vctrs_0.3.8       dbplyr_2.1.1     
## [65] tidyselect_1.1.1  xfun_0.29