class: center, middle, inverse, title-slide .title[ # Visualisation with
ggplot2
] .subtitle[ ## R Foundations for Life Scientists ] .author[ ### Lokesh Mano (Roy Francis) ] --- exclude: true count: false <link href="https://fonts.googleapis.com/css?family=Roboto|Source+Sans+Pro:300,400,600|Ubuntu+Mono&subset=latin-ext" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous"> <!-- ----------------- Only edit title & author above this ----------------- --> # Contents * [Why `ggplot2`?](#intro) * [Grammar of Graphics](#gog) * [Data](#data-iris) * [Geoms](#geom) * [Aesthetics](#aes) * [Scales](#scales-discrete-colour) * [Facets](#facet-wrap) * [Coordinates](#coordinate) * [Theme](#theme) * [Position](#position) * [Saving Plots](#save) * [Combining Plots](#comb) * [Interactive Plots](#interactive) * [Extensions](#extension) --- name: intro class: spaced # Why `ggplot2`? * Consistent code * Flexible * Automatic legends, colors etc * Save plot objects * Themes for reusing styles * Numerous add-ons/extensions * Nearly complete graphing solution -- Not suitable for: * 3D graphics ??? Why can't we just do everything is base plot? Of course, we could, but it's easier, consistent and more structured using `ggplot2`. There is bit of a learning curve, but once the code syntax and graphic building logic is clear, it becomes easy to plot a large variety of graphs. --- name: gvb1 # `ggplot2` vs Base Graphics .pull-left-50[ ``` r hist(iris$Sepal.Length) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-3-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] .pull-right-50[ ``` r library(ggplot2) ggplot(iris,aes(x=Sepal.Length))+ geom_histogram(bins=8) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-4-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] ??? For simple graphs, the base plot seem to take minimal coding effort compared to a ggplot graph. --- name: gvb2 # `ggplot2` vs Base Graphics .pull-left-50[ ``` r plot(iris$Petal.Length,iris$Petal.Width, col=c("red","green","blue")[iris$Species], pch=c(0,1,2)[iris$Species]) legend(x=1,y=2.5, legend=c("setosa","versicolor","virginica"), pch=c(0,1,2),col=c("red","green","blue")) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-5-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] .pull-right-50[ ``` r ggplot(iris,aes(Petal.Length,Sepal.Length,color=Species))+ geom_point() ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-6-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? For anything beyond extremely basic plots, base plotting quickly become complex. More importantly, base plots do not have consistency in it's functions or plotting strategy. --- name: gog class: spaced # Grammar Of Graphics .pull-left-30[ ![](data/slide_ggplot2/gog.jpg) ![](data/slide_ggplot2/gog.png) ] -- .pull-right-70[ * **Data**: Input data * **Geom**: A geometry representing data. Points, Lines etc * **Aesthetic**: Visual characteristics of the geometry. Size, Color, Shape etc * **Scale**: How visual characteristics are converted to display values * **Statistics**: Statistical transformations. Counts, Means etc * **Coordinates**: Numeric system to determine position of geometry. Cartesian, Polar etc * **Facets**: Split data into subsets ] ??? `ggplot` was created by Hadley Wickham in 2005 as an implementation of Leland Wilkinson's book Grammar of Graphics. Different graphs have always been considered as independent entities and also labelled differently such as barplots, scatterplots, boxplots etc. Each graph has it's own function and plotting strategy. Grammar of graphics (GOG) tries to unify all graphs under a common umbrella. GOG brings the idea that graphs are made up of discrete components which can be mixed and matched to create any plot. This creates a consistent underlying framework to graphing. --- name: syntax # Building A Graph: Syntax ![](data/slide_ggplot2/syntax.png) --- name: build-1 # Building A Graph .pull-left-40[ ``` r ggplot(iris) ``` ] .pull-right-50[ <img src="slide_ggplot2_files/figure-html/unnamed-chunk-8-1.png" width="252" style="display: block; margin: auto auto auto 0;" /> ] --- name: build-2 # Building A Graph .pull-left-40[ ``` r ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width)) ``` ] .pull-right-60[ <img src="slide_ggplot2_files/figure-html/unnamed-chunk-10-1.png" width="252" style="display: block; margin: auto auto auto 0;" /> ] --- name: build-3 # Building A Graph .pull-left-40[ ``` r ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width))+ geom_point() ``` ] .pull-right-60[ <img src="slide_ggplot2_files/figure-html/unnamed-chunk-12-1.png" width="252" style="display: block; margin: auto auto auto 0;" /> ] --- name: build-4 # Building A Graph .pull-left-40[ ``` r ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width, colour=Species))+ geom_point() ``` ] .pull-right-60[ <img src="slide_ggplot2_files/figure-html/unnamed-chunk-14-1.png" width="252" style="display: block; margin: auto auto auto 0;" /> ] --- name: data-iris # Data • `iris` * Input data is always an R `data.frame` object <table> <thead> <tr> <th style="text-align:center;"> Sepal.Length </th> <th style="text-align:center;"> Sepal.Width </th> <th style="text-align:center;"> Petal.Length </th> <th style="text-align:center;"> Petal.Width </th> <th style="text-align:center;"> Species </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 5.1 </td> <td style="text-align:center;"> 3.5 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.9 </td> <td style="text-align:center;"> 3.0 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.7 </td> <td style="text-align:center;"> 3.2 </td> <td style="text-align:center;"> 1.3 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> </tbody> </table> ``` r str(iris) ``` ``` ## 'data.frame': 150 obs. of 5 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... ``` ??? It's a good idea to use `str()` to check the input dataframe to make sure that numbers are actually numbers and not characters, for example. Verify that factors are correctly assigned. --- name: data-diamonds # Data • `diamonds` <table> <thead> <tr> <th style="text-align:center;"> carat </th> <th style="text-align:center;"> cut </th> <th style="text-align:center;"> color </th> <th style="text-align:center;"> clarity </th> <th style="text-align:center;"> depth </th> <th style="text-align:center;"> table </th> <th style="text-align:center;"> price </th> <th style="text-align:center;"> x </th> <th style="text-align:center;"> y </th> <th style="text-align:center;"> z </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 0.23 </td> <td style="text-align:center;"> Ideal </td> <td style="text-align:center;"> E </td> <td style="text-align:center;"> SI2 </td> <td style="text-align:center;"> 61.5 </td> <td style="text-align:center;"> 55 </td> <td style="text-align:center;"> 326 </td> <td style="text-align:center;"> 3.95 </td> <td style="text-align:center;"> 3.98 </td> <td style="text-align:center;"> 2.43 </td> </tr> <tr> <td style="text-align:center;"> 0.21 </td> <td style="text-align:center;"> Premium </td> <td style="text-align:center;"> E </td> <td style="text-align:center;"> SI1 </td> <td style="text-align:center;"> 59.8 </td> <td style="text-align:center;"> 61 </td> <td style="text-align:center;"> 326 </td> <td style="text-align:center;"> 3.89 </td> <td style="text-align:center;"> 3.84 </td> <td style="text-align:center;"> 2.31 </td> </tr> <tr> <td style="text-align:center;"> 0.23 </td> <td style="text-align:center;"> Good </td> <td style="text-align:center;"> E </td> <td style="text-align:center;"> VS1 </td> <td style="text-align:center;"> 56.9 </td> <td style="text-align:center;"> 65 </td> <td style="text-align:center;"> 327 </td> <td style="text-align:center;"> 4.05 </td> <td style="text-align:center;"> 4.07 </td> <td style="text-align:center;"> 2.31 </td> </tr> </tbody> </table> ``` r str(diamonds) ``` ``` ## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame) ## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ... ## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ... ## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ... ## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ... ## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ... ## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ... ## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ... ## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ... ## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ... ## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ... ``` ??? R `data.frame` is a tabular format with rows and columns just like a spreadsheet. All items in a row or a column must be available or missing values filled in as NAs. --- name: data-format # Data • Format -- - Wide format <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: blue !important;"> </th> <th style="text-align:right;font-weight: bold;color: blue !important;"> Sample_1 </th> <th style="text-align:right;font-weight: bold;color: blue !important;"> Sample_2 </th> <th style="text-align:right;font-weight: bold;color: blue !important;"> Sample_3 </th> <th style="text-align:right;font-weight: bold;color: blue !important;"> Sample_4 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: orange !important;color: red !important;"> ENSG00000000003 </td> <td style="text-align:right;color: orange !important;"> 321 </td> <td style="text-align:right;color: orange !important;"> 303 </td> <td style="text-align:right;color: orange !important;"> 204 </td> <td style="text-align:right;color: orange !important;"> 492 </td> </tr> <tr> <td style="text-align:left;color: orange !important;color: red !important;"> ENSG00000000005 </td> <td style="text-align:right;color: orange !important;"> 0 </td> <td style="text-align:right;color: orange !important;"> 0 </td> <td style="text-align:right;color: orange !important;"> 0 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> <tr> <td style="text-align:left;color: orange !important;color: red !important;"> ENSG00000000419 </td> <td style="text-align:right;color: orange !important;"> 696 </td> <td style="text-align:right;color: orange !important;"> 660 </td> <td style="text-align:right;color: orange !important;"> 472 </td> <td style="text-align:right;color: orange !important;"> 951 </td> </tr> <tr> <td style="text-align:left;color: orange !important;color: red !important;"> ENSG00000000457 </td> <td style="text-align:right;color: orange !important;"> 59 </td> <td style="text-align:right;color: orange !important;"> 54 </td> <td style="text-align:right;color: orange !important;"> 44 </td> <td style="text-align:right;color: orange !important;"> 109 </td> </tr> <tr> <td style="text-align:left;color: orange !important;color: red !important;"> ENSG00000000460 </td> <td style="text-align:right;color: orange !important;"> 399 </td> <td style="text-align:right;color: orange !important;"> 405 </td> <td style="text-align:right;color: orange !important;"> 236 </td> <td style="text-align:right;color: orange !important;"> 445 </td> </tr> <tr> <td style="text-align:left;color: orange !important;color: red !important;"> ENSG00000000938 </td> <td style="text-align:right;color: orange !important;"> 0 </td> <td style="text-align:right;color: orange !important;"> 0 </td> <td style="text-align:right;color: orange !important;"> 0 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> </tbody> </table> -- * familiarity * conveniency * you see more data --- name: data-format-2 # Data • Format - Long format -- <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Sample_ID </th> <th style="text-align:left;"> Gene </th> <th style="text-align:right;"> count </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: red !important;"> ENSG00000000003 </td> <td style="text-align:right;color: orange !important;"> 321 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: red !important;"> ENSG00000000005 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: red !important;"> ENSG00000000419 </td> <td style="text-align:right;color: orange !important;"> 696 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: red !important;"> ENSG00000000457 </td> <td style="text-align:right;color: orange !important;"> 59 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: red !important;"> ENSG00000000460 </td> <td style="text-align:right;color: orange !important;"> 399 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: red !important;"> ENSG00000000938 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> </tbody> </table> -- <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Sample_ID </th> <th style="text-align:left;"> Sample_Name </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Replicate </th> <th style="text-align:left;"> Cell </th> <th style="text-align:left;"> Gene </th> <th style="text-align:right;"> count </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000003 </td> <td style="text-align:right;color: orange !important;"> 321 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000005 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000419 </td> <td style="text-align:right;color: orange !important;"> 696 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000457 </td> <td style="text-align:right;color: orange !important;"> 59 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000460 </td> <td style="text-align:right;color: orange !important;"> 399 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000938 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> </tbody> </table> --- name: data-format-3 # Data • Format - Long format <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Sample_ID </th> <th style="text-align:left;"> Sample_Name </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Replicate </th> <th style="text-align:left;"> Cell </th> <th style="text-align:left;"> Gene </th> <th style="text-align:right;"> count </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000003 </td> <td style="text-align:right;color: orange !important;"> 321 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000005 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000419 </td> <td style="text-align:right;color: orange !important;"> 696 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000457 </td> <td style="text-align:right;color: orange !important;"> 59 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000460 </td> <td style="text-align:right;color: orange !important;"> 399 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000938 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> </tbody> </table> -- * easier to add data to the existing * Most databases store and maintain in long-formats due to its efficiency * R tools **like ggplot** require data in long format. * Functions available to change between data-formats * `melt()` from **reshape2** * `gather()` from **tidyverse** --- name: geom # Geoms ![geoms](data/slide_ggplot2/geoms.png) -- ``` r p <- ggplot(iris) # scatterplot p+geom_point(aes(x=Sepal.Length,y=Sepal.Width)) # barplot p+geom_bar(aes(x=Sepal.Length)) # boxplot p+geom_boxplot(aes(x=Species,y=Sepal.Width)) # search help.search("^geom_",package="ggplot2") ``` ??? Geoms are the geometric components of a graph such as points, lines etc used to represent data. The same data can be visually represented in different geoms. For example, points or bars. Mandatory input requirements change depending on geoms. --- name: aes # Aesthetics * Aesthetic mapping vs aesthetic parameter .pull-left-50[ ``` r ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, size=Petal.Length, alpha=Petal.Width, shape=Species, color=Species)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-24-1.png" width="396" style="display: block; margin: auto auto auto 0;" /> ] .pull-left-50[ ``` r ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width), size=2, alpha=0.8, shape=15, color="steelblue") ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-25-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] ??? Aesthetics are used to assign values to geometries. For example, a set of points can be a fixed size or can be different colors or sizes denoting a variable. This would be an incorrect way to do it. ``` ggplot(iris)+ geom_point(aes(x=Sepal.Length,y=Sepal.Width,size=2) ``` --- name: aes-2 # Aesthetics ``` r x1 <- ggplot(iris) + geom_point(aes(x=Sepal.Length,y=Sepal.Width))+ stat_smooth(aes(x=Sepal.Length,y=Sepal.Width)) x2 <- ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_point() + geom_smooth() x1|x2 ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-26-1.png" width="576" style="display: block; margin: auto auto auto 0;" /> ??? If the same aesthetics are used in multiple geoms, they can be moved to `ggplot()`. --- name: multiple-geom # Multiple Geoms ``` r ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_point()+ geom_line()+ geom_smooth()+ geom_rug()+ geom_step()+ geom_text(data=subset(iris,iris$Species=="setosa"),aes(label=Species)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-27-1.png" width="648" style="display: block; margin: auto auto auto 0;" /> ??? Multiple geoms can be plotted one after the other. The order in which items are specified in the command dictates the plotting order on the actual plot. In this case, the points appear over the lines. ``` ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_point()+ geom_line()+ ``` while here the lines appear above the points. ``` ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_line()+ geom_point()+ ``` Each geom takes input from `ggplot()` inputs. If extra input is required to a geom, it can be specified additionally inside `aes()`. `data` can be changed if needed for specific geoms. --- name: scales-discrete-color # Scales • Discrete Colors * scales: position, color, fill, size, shape, alpha, linetype * syntax: `scale_<aesthetic>_<type>` <img src="data/slide_ggplot2/scales.png" alt="scales-syntax" style="width:50%;"> -- .pull-left-50[ ``` r p <- ggplot(iris)+geom_point(aes(x=Sepal.Length, y=Sepal.Width,color=Species)) p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-28-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r p + scale_color_manual( name="Manual", values=c("#5BC0EB","#FDE74C","#9BC53D")) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-29-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] ??? Scales are used to control the aesthetics. For example the aesthetic color is mapped to a variable `x`. The palette of colors used, the mapping of which color to which value, the upper and lower limit of the data and colors etc is controlled by scales. --- name: scales-continuous-color # Scales • Continuous Colors * In RStudio, type `scale_`, then press **TAB** -- .pull-left-50[ ``` r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, shape=Species,color=Petal.Length)) p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-30-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r p + scale_color_gradient(name="Pet Len", breaks=range(iris$Petal.Length), labels=c("Min","Max"), low="black",high="red") ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-31-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? Continuous colours can be changed using `scale_color_gradient()` for two colour gradient. Any number of breaks and colours can be specified using `scale_color_gradientn()`. --- name: scales-shape # Scales • Shape .pull-left-50[ ``` r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, shape=Species,color=Species)) p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-32-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r p + scale_color_manual(name="New", values=c("blue","green","red"))+ scale_shape_manual(name="Bla",values=c(0,1,2)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-33-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? Shape scale can be adjusted using `scale_shape_manual()`. Multiple mappings for the same variable groups legends. --- name: scales-axis # Scales • Axes * scales: x, y * syntax: `scale_<axis>_<type>` * arguments: name, limits, breaks, labels -- .pull-left-50[ ``` r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width)) p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-34-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r p + scale_color_manual(name="New", values=c("blue","green","red"))+ scale_x_continuous(name="Sepal Length", breaks=seq(1,8),limits=c(3,5)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-35-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? The x and y axes are also controlled by scales. The axis break points, the break point text and limits are controlled through scales. When setting limits using `scale_`, the data outside the limits are dropped. Limits can also be set using `lims(x=c(3.5))` or `xlim(c(3,5))`. When mapping, `coord_map()` or `coord_cartesian()` is recommended for setting limits. --- name: facet-wrap # Facets • `facet_wrap` * Split to subplots based on variable(s) * Facetting in one dimension -- .pull-left-50[ ``` r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species)) p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-36-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r p + facet_wrap(~Species) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-37-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ``` r p + facet_wrap(~Species,nrow=3) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-38-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ] ??? `facet_wrap` is used to split a plot into subplots based on the categories in one or more variables. --- name: facet-grid # Facets • `facet_grid` * Facetting in two dimensions .pull-left-50[ ``` r p <- diamonds %>% ggplot(aes(carat,price))+ geom_point() p + facet_grid(~cut+clarity) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-39-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-left-50[ ``` r p + facet_grid(cut~clarity) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-40-1.png" width="374.4" style="display: block; margin: auto auto auto 0;" /> ] ??? `facet_grid` is also used to split a plot into subplots based on the categories in one or more variables. `facet_grid` can be used to create a matrix-like grid of two variables. --- name: coordinate # Coordinate Systems ![](data/slide_ggplot2/coordinate.png) * `coord_cartesian(xlim=c(2,8))` for zooming in * `coord_map` for controlling limits on maps * `coord_polar` .pull-left-50[ ``` r p <- ggplot(iris,aes(x="",y=Petal.Length,fill=Species))+ geom_bar(stat="identity") p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-41-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] ??? The coordinate system defines the surface used to represent numbers. Most plots use the cartesian coordinate sytem. Pie charts for example, is a polar coordinate projection of a cartesian barplot. Maps for example can have numerous coordinate systems called map projections. For example; UTM coordinates. -- .pull-right-50[ ``` r p+coord_polar("y",start=0) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-42-1.png" width="316.8" style="display: block; margin: auto auto auto 0;" /> ] --- name: theme # Theme * Modify non-data plot elements/appearance * Axis labels, panel colors, legend appearance etc * Save a particular appearance for reuse * `?theme` -- .pull-left-50[ ``` r ggplot(iris,aes(Petal.Length))+ geom_histogram()+ facet_wrap(~Species,nrow=2)+ theme_grey() ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-43-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r ggplot(iris,aes(Petal.Length))+ geom_histogram()+ facet_wrap(~Species,nrow=2)+ theme_bw() ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-44-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] ??? Themes allow to modify all non-data related components of the plot. This is the visual appearance of the plot. Examples include the axes line thickness, the background color or font family. --- name: theme-legend # Theme • Legend ``` r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species)) ``` .pull-left-50[ ``` r p + theme(legend.position="top") ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-46-1.png" width="309.6" style="display: block; margin: auto auto auto 0;" /> ] .pull-right-50[ ``` r p + theme(legend.position="bottom") ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-47-1.png" width="309.6" style="display: block; margin: auto auto auto 0;" /> ] --- name: theme-text # Theme • Text ``` r element_text(family=NULL,face=NULL,color=NULL,size=NULL,hjust=NULL, vjust=NULL, angle=NULL,lineheight=NULL,margin = NULL) ``` ``` r p <- p + theme( axis.title=element_text(color="#e41a1c"), axis.text=element_text(color="#377eb8"), plot.title=element_text(color="#4daf4a"), plot.subtitle=element_text(color="#984ea3"), legend.text=element_text(color="#ff7f00"), legend.title=element_text(color="#ffff33"), strip.text=element_text(color="#a65628") ) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-51-1.png" width="720" style="display: block; margin: auto auto auto 0;" /> --- name: theme-rect # Theme • Rect ``` r element_rect(fill=NULL,color=NULL,size=NULL,linetype=NULL) ``` ``` r p <- p + theme( plot.background=element_rect(fill="#b3e2cd"), panel.background=element_rect(fill="#fdcdac"), panel.border=element_rect(fill=NA,color="#cbd5e8",size=3), legend.background=element_rect(fill="#f4cae4"), legend.box.background=element_rect(fill="#e6f5c9"), strip.background=element_rect(fill="#fff2ae") ) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-55-1.png" width="720" style="display: block; margin: auto auto auto 0;" /> --- name: theme-save # Theme • Reuse ``` r newtheme <- theme_bw() + theme( axis.ticks=element_blank(), panel.background=element_rect(fill="white"), panel.grid.minor=element_blank(), panel.grid.major.x=element_blank(), panel.grid.major.y=element_line(size=0.3,color="grey90"), panel.border=element_blank(), legend.position="top", legend.justification="right" ) ``` .pull-left-50[ ``` r p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-58-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] .pull-right-50[ ``` r p + newtheme ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-59-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ] --- name: position # Position ``` ## Murder Assault UrbanPop Rape ## Alabama 13.2 236 58 21.2 ## Alaska 10.0 263 48 44.5 ## Arizona 8.1 294 80 31.0 ``` ``` r us <- USArrests %>% mutate(state=rownames(.)) %>% slice(1:4) %>% gather(key=type,value=value,-state) p <- ggplot(us,aes(x=state,y=value,fill=type)) ``` -- .pull-left-50[ ``` r p + geom_bar(stat="identity",position="stack") ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-62-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r p + geom_bar(stat="identity",position="dodge") ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-63-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ] --- name: save # Saving plots ``` r p <- ggplot(iris,aes(Petal.Length,Sepal.Length,color=Species))+ geom_point() ``` * `ggplot2` plots can be saved just like base plots ``` r png("plot.png",height=5,width=7,units="cm",res=200) print(p) dev.off() ``` * `ggplot2` package offers a convenient function ``` r ggsave("plot.png",p,height=5,width=7,units="cm",dpi=200,type="cairo") ``` * Use `type="cairo"` for nicer anti-aliasing * Note that default units in `png` is pixels while in `ggsave` it's inches --- name: extension class: spaced # Extensions * [**gridExtra**](https://cran.r-project.org/web/packages/gridExtra/index.html): Extends grid graphics functionality * [**ggpubr**](http://www.sthda.com/english/rpkgs/ggpubr/): Useful functions to prepare plots for publication * [**cowplot**](https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html): Combining plots * [**ggthemes**](https://cran.r-project.org/web/packages/ggthemes/vignettes/ggthemes.html): Set of extra themes * [**ggthemr**](https://github.com/cttobin/ggthemr): More themes * [**ggsci**](https://cran.r-project.org/web/packages/ggsci/vignettes/ggsci.html): Color palettes for scales * [**ggrepel**](https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html): Advanced text labels including overlap control * [**ggmap**](https://github.com/dkahle/ggmap): Dedicated to mapping * [**ggraph**](https://github.com/thomasp85/ggraph): Network graphs * [**ggiraph**](http://davidgohel.github.io/ggiraph/): Converting ggplot2 to interactive graphics --- name: help class: spaced # Help * [**ggplot2 official reference**](http://ggplot2.tidyverse.org/reference/) * [**The R cookbook**](http://www.cookbook-r.com/) * [**StackOverflow**](https://stackoverflow.com/) * [**RStudio Cheatsheet**](https://www.rstudio.com/resources/cheatsheets/) * [**r-statistics Cheatsheet**](http://r-statistics.co/ggplot2-cheatsheet.html) * [**ggplot2 GUI**](https://site.shinyserver.dck.gmw.rug.nl/ggplotgui/) * Numerous personal blogs, r-bloggers.com etc. <!-- --------------------- Do not edit this and below --------------------- --> --- name: end-slide class: end-slide, middle count: false # Thank you. Questions? .end-text[ <p class="smaller"> <span class="small" style="line-height: 1.2;">Graphics from </span><img src="./assets/freepik.jpg" style="max-height:20px; vertical-align:middle;"><br> Created: 31-Oct-2024 • <a href="https://www.scilifelab.se/">SciLifeLab</a> • <a href="https://nbis.se/">NBIS</a> </p> ]