class: center, middle, inverse, title-slide # Working with
ggplot2
## RaukR 2022 • Advanced R for Bioinformatics ###
Roy Francis
### NBIS, SciLifeLab --- exclude: true count: false <link href="https://fonts.googleapis.com/css?family=Roboto|Source+Sans+Pro:300,400,600|Ubuntu+Mono&subset=latin-ext" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous"> <!-- ----------------- Only edit title & author above this ----------------- --> ## Contents * [Why `ggplot2`?](#intro) * [Grammar of Graphics](#gog) * [Data](#data-iris) * [Geoms](#geom) * [Stats](#stat-1) * [Aesthetics](#aes) * [Scales](#scales-discrete-colour) * [Facets](#facet-wrap) * [Coordinates](#coordinate) * [Theme](#theme) * [Position](#position) * [Saving Plots](#save) * [Combining Plots](#comb) * [Interactive Plots](#interactive) * [Extensions](#extension) --- # Graphs Graphing is an essential part of data analyses. Data with same summary statistics can look very different when plotted out. -- .pull-left-40[ ![](ggplot_presentation_assets/anscombe.jpg) ] -- .pull-right-60[ ![](ggplot_presentation_assets/datasaurus.jpg) ] .small[ [Anscombe's quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet) [Datasaurus](https://www.autodesk.com/research/publications/same-stats-different-graphs) ] --- # Graph quality <img src="ggplot_presentation_assets/scientific_paper_graph_quality.jpg" width="500px"> -- <img src="ggplot_presentation_assets/excel.png" width="300px"> .small[[xkcd](https://xkcd.com/1945/)] --- # R graphics .pull-left-50[ ![](ggplot_presentation_assets/base.png) ] -- .pull-left-50[ ![](ggplot_presentation_assets/ggplot-example.png) ] --- # R graphics ![](ggplot_presentation_assets/bbc.png) .small[[How BBC works with R graphics](https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535)] --- name: gvb1 ## `ggplot2` vs Base Graphics .pull-left-50[ ```r hist(iris$Sepal.Length) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-5-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] .pull-right-50[ ```r library(ggplot2) ggplot(iris,aes(x=Sepal.Length))+ geom_histogram(bins=8) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-6-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] ??? For simple graphs, the base plot seem to take minimal coding effort compared to a ggplot graph. --- ![](ggplot_presentation_assets/meme-confused.jpg) --- name: gvb2 ## `ggplot2` vs Base Graphics .pull-left-50[ ```r plot(iris$Petal.Length,iris$Petal.Width, col=c("red","green","blue")[iris$Species], pch=c(0,1,2)[iris$Species]) legend(x=1,y=2.5, legend=c("setosa","versicolor","virginica"), pch=c(0,1,2),col=c("red","green","blue")) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-7-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] .pull-right-50[ ```r ggplot(iris,aes(Petal.Length,Sepal.Length,color=Species))+ geom_point() ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-8-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? For anything beyond extremely basic plots, base plotting quickly become complex. More importantly, base plots do not have consistency in it's functions or plotting strategy. --- name: intro class: spaced ## Why `ggplot2`? - Consistent code for any type of plot (almost!) - Flexible and modular (Add/remove components) - Automatic legends, colors etc - Save plot objects - Themes for reusing styles - Numerous add-ons/extensions - Nearly complete structured graphing solution - Adapted to other programming languages - [Gadfly](http://gadflyjl.org/stable/) in Julia - [gramm](https://se.mathworks.com/matlabcentral/fileexchange/54465-gramm-complete-data-visualization-toolbox-ggplot2-r-like) in MatLab - [GGPlot](https://metacpan.org/pod/Chart::GGPlot) in Perl - [Vega](https://vega.github.io/vega/) in Javascript - [PlotNine](https://plotnine.readthedocs.io/en/stable/) , [ggpy](https://github.com/yhat/ggpy), [lets-plot](https://lets-plot.org/) in Python ??? Why can't we just do everything is base plot? Of course, we could, but it's easier, consistent and more structured using `ggplot2`. There is bit of a learning curve, but once the code syntax and graphic building logic is clear, it becomes easy to plot a large variety of graphs. --- name: gog class: spaced ## Grammar Of Graphics .pull-left-30[ ![](ggplot_presentation_assets/gog.jpg) ![](ggplot_presentation_assets/gog.png) ] -- .pull-right-70[ - Created by Hadley Wickham in 2005 - **Data**: Input data - **Geom**: A geometry representing data. Points, Lines etc - **Aesthetic**: Visual characteristics of the geometry. Size, Color, Shape etc - **Scale**: How visual characteristics are converted to display values - **Statistics**: Statistical transformations. Counts, Means etc - **Coordinates**: Numeric system to determine position of geometry. Cartesian, Polar etc - **Facets**: Split data into subsets ] ??? `ggplot` was created by Hadley Wickham in 2005 as an implementation of Leland Wilkinson's book Grammar of Graphics. Different graphs have always been considered as independent entities and also labelled differently such as barplots, scatterplots, boxplots etc. Each graph has it's own function and plotting strategy. Grammar of graphics (GOG) tries to unify all graphs under a common umbrella. GOG brings the idea that graphs are made up of discrete components which can be mixed and matched to create any plot. This creates a consistent underlying framework to graphing. --- name: syntax ## Building A Graph: Syntax ![](ggplot_presentation_assets/syntax.png) --- name: build-1 ## Building A Graph .pull-left-40[ ```r data(iris) ggplot(iris) ``` ] .pull-right-50[ <img src="ggplot_presentation_files/figure-html/bag-1-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] --- name: build-2 ## Building A Graph .pull-left-40[ ```r ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width)) ``` ] .pull-right-50[ <img src="ggplot_presentation_files/figure-html/bag-2-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] --- name: build-3 ## Building A Graph .pull-left-40[ ```r ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width))+ geom_point() ``` ] .pull-right-50[ ] --- name: build-4 ## Building A Graph .pull-left-40[ ```r ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width, colour=Species))+ geom_point() ``` ![](ggplot_presentation_assets/syntax.png) ] .pull-right-50[ ] --- name: data-iris ## Data • `iris` * Input data is always an R `data.frame` object <table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; "> <thead> <tr> <th style="text-align:center;"> Sepal.Length </th> <th style="text-align:center;"> Sepal.Width </th> <th style="text-align:center;"> Petal.Length </th> <th style="text-align:center;"> Petal.Width </th> <th style="text-align:center;"> Species </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 5.1 </td> <td style="text-align:center;"> 3.5 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.9 </td> <td style="text-align:center;"> 3.0 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.7 </td> <td style="text-align:center;"> 3.2 </td> <td style="text-align:center;"> 1.3 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.6 </td> <td style="text-align:center;"> 3.1 </td> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 5.0 </td> <td style="text-align:center;"> 3.6 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 5.4 </td> <td style="text-align:center;"> 3.9 </td> <td style="text-align:center;"> 1.7 </td> <td style="text-align:center;"> 0.4 </td> <td style="text-align:center;"> setosa </td> </tr> </tbody> </table> ```r str(iris) ``` ``` ## 'data.frame': 150 obs. of 5 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... ``` ??? It's a good idea to use `str()` to check the input dataframe to make sure that numbers are actually numbers and not characters, for example. Verify that factors are correctly assigned. --- name: data-diamonds ## Data • `diamonds` <table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; "> <thead> <tr> <th style="text-align:center;"> carat </th> <th style="text-align:center;"> cut </th> <th style="text-align:center;"> color </th> <th style="text-align:center;"> clarity </th> <th style="text-align:center;"> depth </th> <th style="text-align:center;"> table </th> <th style="text-align:center;"> price </th> <th style="text-align:center;"> x </th> <th style="text-align:center;"> y </th> <th style="text-align:center;"> z </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 0.23 </td> <td style="text-align:center;"> Ideal </td> <td style="text-align:center;"> E </td> <td style="text-align:center;"> SI2 </td> <td style="text-align:center;"> 61.5 </td> <td style="text-align:center;"> 55 </td> <td style="text-align:center;"> 326 </td> <td style="text-align:center;"> 3.95 </td> <td style="text-align:center;"> 3.98 </td> <td style="text-align:center;"> 2.43 </td> </tr> <tr> <td style="text-align:center;"> 0.21 </td> <td style="text-align:center;"> Premium </td> <td style="text-align:center;"> E </td> <td style="text-align:center;"> SI1 </td> <td style="text-align:center;"> 59.8 </td> <td style="text-align:center;"> 61 </td> <td style="text-align:center;"> 326 </td> <td style="text-align:center;"> 3.89 </td> <td style="text-align:center;"> 3.84 </td> <td style="text-align:center;"> 2.31 </td> </tr> <tr> <td style="text-align:center;"> 0.23 </td> <td style="text-align:center;"> Good </td> <td style="text-align:center;"> E </td> <td style="text-align:center;"> VS1 </td> <td style="text-align:center;"> 56.9 </td> <td style="text-align:center;"> 65 </td> <td style="text-align:center;"> 327 </td> <td style="text-align:center;"> 4.05 </td> <td style="text-align:center;"> 4.07 </td> <td style="text-align:center;"> 2.31 </td> </tr> <tr> <td style="text-align:center;"> 0.29 </td> <td style="text-align:center;"> Premium </td> <td style="text-align:center;"> I </td> <td style="text-align:center;"> VS2 </td> <td style="text-align:center;"> 62.4 </td> <td style="text-align:center;"> 58 </td> <td style="text-align:center;"> 334 </td> <td style="text-align:center;"> 4.20 </td> <td style="text-align:center;"> 4.23 </td> <td style="text-align:center;"> 2.63 </td> </tr> <tr> <td style="text-align:center;"> 0.31 </td> <td style="text-align:center;"> Good </td> <td style="text-align:center;"> J </td> <td style="text-align:center;"> SI2 </td> <td style="text-align:center;"> 63.3 </td> <td style="text-align:center;"> 58 </td> <td style="text-align:center;"> 335 </td> <td style="text-align:center;"> 4.34 </td> <td style="text-align:center;"> 4.35 </td> <td style="text-align:center;"> 2.75 </td> </tr> <tr> <td style="text-align:center;"> 0.24 </td> <td style="text-align:center;"> Very Good </td> <td style="text-align:center;"> J </td> <td style="text-align:center;"> VVS2 </td> <td style="text-align:center;"> 62.8 </td> <td style="text-align:center;"> 57 </td> <td style="text-align:center;"> 336 </td> <td style="text-align:center;"> 3.94 </td> <td style="text-align:center;"> 3.96 </td> <td style="text-align:center;"> 2.48 </td> </tr> </tbody> </table> ```r str(diamonds) ``` ``` ## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame) ## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ... ## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ... ## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ... ## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ... ## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ... ## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ... ## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ... ## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ... ## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ... ## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ... ``` ??? R `data.frame` is a tabular format with rows and columns just like a spreadsheet. All items in a row or a column must be available or missing values filled in as NAs. --- name: data-format ## Data • Format .size-80[![](ggplot_presentation_assets/tidy.png)] **Wide** <table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; "> <thead> <tr> <th style="text-align:center;"> Sepal.Length </th> <th style="text-align:center;"> Sepal.Width </th> <th style="text-align:center;"> Petal.Length </th> <th style="text-align:center;"> Petal.Width </th> <th style="text-align:center;"> Species </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 5.1 </td> <td style="text-align:center;"> 3.5 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.9 </td> <td style="text-align:center;"> 3.0 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.7 </td> <td style="text-align:center;"> 3.2 </td> <td style="text-align:center;"> 1.3 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> </tbody> </table> **Long** <table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; "> <thead> <tr> <th style="text-align:center;"> Species </th> <th style="text-align:center;"> variable </th> <th style="text-align:center;"> value </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> setosa </td> <td style="text-align:center;"> Sepal.Length </td> <td style="text-align:center;"> 5.1 </td> </tr> <tr> <td style="text-align:center;"> setosa </td> <td style="text-align:center;"> Sepal.Length </td> <td style="text-align:center;"> 4.9 </td> </tr> <tr> <td style="text-align:center;"> setosa </td> <td style="text-align:center;"> Sepal.Length </td> <td style="text-align:center;"> 4.7 </td> </tr> </tbody> </table> ??? The data must be cleaned up and prepared for plotting. The data must be 'tidy'. Columns must be variables and rows must be observations. The data can then be in wide or long format depending on the variables to be plotted. --- name: geom ## Geoms ![geoms](ggplot_presentation_assets/geoms.png) -- ```r p <- ggplot(iris) # scatterplot p+geom_point(aes(x=Sepal.Length,y=Sepal.Width)) # barplot p+geom_bar(aes(x=Sepal.Length)) # boxplot p+geom_boxplot(aes(x=Species,y=Sepal.Width)) # search help.search("^geom_",package="ggplot2") ``` ??? Geoms are the geometric components of a graph such as points, lines etc used to represent data. The same data can be visually represented in different geoms. For example, points or bars. Mandatory input requirements change depending on geoms. --- name: stat-1 ## Stats * Stats compute new variables from input data. -- * Geoms have default stats. ```r x <- ggplot(iris) + geom_bar(aes(x=Sepal.Length),stat="bin") y <- ggplot(iris) + geom_bar(aes(x=Species),stat="count") z <- ggplot(iris) + geom_bar(aes(x=Species,y=Sepal.Length),stat="identity") wrap_plots(x,y,z,nrow=1) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-18-1.png" width="576" style="display: block; margin: auto auto auto 0;" /> -- * Plots can be built with stats. ```r x <- ggplot(iris) + stat_bin(aes(x=Sepal.Length),geom="bar") y <- ggplot(iris) + stat_count(aes(x=Species),geom="bar") z <- ggplot(iris) + stat_identity(aes(x=Species,y=Sepal.Length),geom="bar") wrap_plots(x,y,z,nrow=1) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-19-1.png" width="576" style="display: block; margin: auto auto auto 0;" /> ??? * Normally the data is plotted directly from input as it is. * Some plots require the data to be computed or transformed. Eg. boxplot, histograms, smoothing, predictions, regression etc. --- name: stat-2 ## Stats * Stats have default geoms. <table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; "> <thead> <tr> <th style="text-align:left;"> plot </th> <th style="text-align:left;"> stat </th> <th style="text-align:left;"> geom </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> histogram </td> <td style="text-align:left;"> bin </td> <td style="text-align:left;"> bar </td> </tr> <tr> <td style="text-align:left;"> smooth </td> <td style="text-align:left;"> smooth </td> <td style="text-align:left;"> line </td> </tr> <tr> <td style="text-align:left;"> boxplot </td> <td style="text-align:left;"> boxplot </td> <td style="text-align:left;"> boxplot </td> </tr> <tr> <td style="text-align:left;"> density </td> <td style="text-align:left;"> density </td> <td style="text-align:left;"> line </td> </tr> <tr> <td style="text-align:left;"> freqpoly </td> <td style="text-align:left;"> freqpoly </td> <td style="text-align:left;"> line </td> </tr> </tbody> </table> Use `args(geom_bar)` to check arguments. --- name: aes ## Aesthetics * Aesthetic mapping vs aesthetic parameter .pull-left-50[ ```r ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, size=Petal.Length, alpha=Petal.Width, shape=Species, color=Species)) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-22-1.png" width="396" style="display: block; margin: auto auto auto 0;" /> ] .pull-left-50[ ```r ggplot(iris)+ geom_point(aes(x=Sepal.Length, * y=Sepal.Width), size=2, alpha=0.8, shape=15, color="steelblue") ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-23-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? Aesthetics are used to assign values to geometries. For example, a set of points can be a fixed size or can be different colors or sizes denoting a variable. This would be an incorrect way to do it. ``` ggplot(iris)+ geom_point(aes(x=Sepal.Length,y=Sepal.Width,size=2) ``` --- name: aes-2 ## Aesthetics ```r x1 <- ggplot(iris) + geom_point(aes(x=Sepal.Length,y=Sepal.Width))+ stat_smooth(aes(x=Sepal.Length,y=Sepal.Width)) x2 <- ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_point() + geom_smooth() wrap_plots(x1,x2,nrow=1,ncol=2) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-24-1.png" width="720" style="display: block; margin: auto auto auto 0;" /> ??? If the same aesthetics are used in multiple geoms, they can be moved to `ggplot()`. --- name: multiple-geom ## Multiple Geoms ```r ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_point()+ geom_line()+ geom_smooth()+ geom_rug()+ geom_step()+ geom_text(data=subset(iris,iris$Species=="setosa"),aes(label=Species)) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-25-1.png" width="864" style="display: block; margin: auto auto auto 0;" /> ??? Multiple geoms can be plotted one after the other. The order in which items are specified in the command dictates the plotting order on the actual plot. In this case, the points appear over the lines. ``` ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_point()+ geom_line()+ ``` while here the lines appear above the points. ``` ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_line()+ geom_point()+ ``` Each geom takes input from `ggplot()` inputs. If extra input is required to a geom, it can be specified additionally inside `aes()`. `data` can be changed if needed for specific geoms. --- ![](ggplot_presentation_assets/complicated-graphs.jpg) Just because you can doesn't mean you should. --- name: scales-intro # Scales ![](ggplot_presentation_assets/meme-scales.jpg) --- name: scales-discrete-color ## Scales • Discrete Colors * scales: position, color, fill, size, shape, alpha, linetype * syntax: `scale_<aesthetic>_<type>` <img src="ggplot_presentation_assets/scales.png" alt="scales-syntax" style="width:50%;"> -- .pull-left-50[ ```r p <- ggplot(iris)+geom_point(aes(x=Sepal.Length, y=Sepal.Width,color=Species)) p ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-26-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ```r p + scale_color_manual( name="Manual", values=c("#5BC0EB","#FDE74C","#9BC53D")) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-27-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? Scales are used to control the aesthetics. For example the aesthetic color is mapped to a variable `x`. The palette of colors used, the mapping of which color to which value, the upper and lower limit of the data and colors etc is controlled by scales. --- name: scales-continuous-color ## Scales • Continuous Colors * In RStudio, type `scale_`, then press **TAB** -- .pull-left-50[ ```r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, shape=Species,color=Petal.Length)) p ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-28-1.png" width="396" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ```r p + scale_color_gradient(name="Pet Len", breaks=range(iris$Petal.Length), labels=c("Min","Max"), low="black",high="red") ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-29-1.png" width="396" style="display: block; margin: auto auto auto 0;" /> ] ??? Continuous colours can be changed using `scale_color_gradient()` for two colour gradient. Any number of breaks and colours can be specified using `scale_color_gradientn()`. --- name: scales-shape ## Scales • Shape .pull-left-50[ ```r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, shape=Species,color=Species)) p ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-30-1.png" width="396" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ```r p + scale_color_manual(name="New", values=c("blue","green","red"))+ scale_shape_manual(name="Bla",values=c(0,1,2)) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-31-1.png" width="396" style="display: block; margin: auto auto auto 0;" /> ] ??? Shape scale can be adjusted using `scale_shape_manual()`. Multiple mappings for the same variable groups legends. --- name: scales-axis ## Scales • Axes * scales: x, y * syntax: `scale_<axis>_<type>` * arguments: name, limits, breaks, labels -- .pull-left-50[ ```r p <- ggplot(iris)+geom_point( aes(x=Sepal.Length,y=Sepal.Width)) p ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-32-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ```r p + scale_x_continuous(name="Sepal Length", breaks=seq(1,8),limits=c(3,5)) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-33-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? The x and y axes are also controlled by scales. The axis break points, the break point text and limits are controlled through scales. When setting limits using `scale_`, the data outside the limits are dropped. Limits can also be set using `lims(x=c(3,5))` or `xlim(c(3,5))`. When mapping, `coord_map()` or `coord_cartesian()` is recommended for setting limits. --- name: facet-intro # Facets ![](ggplot_presentation_assets/meme-facets.jpg) --- name: facet-wrap ## Facets • `facet_wrap` * Split to subplots based on variable(s) * Faceting in one dimension -- .pull-left-50[ ```r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species)) p ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-34-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ```r p + facet_wrap(~Species) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-35-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ```r p + facet_wrap(~Species,nrow=3) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-36-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ] ??? `facet_wrap` is used to split a plot into subplots based on the categories in one or more variables. --- name: facet-grid ## Facets • `facet_grid` * Faceting in two dimensions .pull-left-50[ ```r p <- diamonds %>% ggplot(aes(carat,price))+ geom_point() p + facet_grid(~cut+clarity) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-37-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-left-50[ ```r p + facet_grid(cut~clarity) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-38-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? `facet_grid` is also used to split a plot into subplots based on the categories in one or more variables. `facet_grid` can be used to create a matrix-like grid of two variables. --- name: coordinates-intro # Coordinates ![](ggplot_presentation_assets/meme-coordinates.jpg) --- name: coordinate ## Coordinate Systems ![](ggplot_presentation_assets/coordinate.png) * `coord_cartesian(xlim=c(2,8))` for zooming in * `coord_map` for controlling limits on maps * `coord_polar` .pull-left-50[ ```r p <- ggplot(iris,aes(x="",y=Petal.Length,fill=Species))+ geom_bar(stat="identity") p ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-39-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? The coordinate system defines the surface used to represent numbers. Most plots use the cartesian coordinate sytem. Pie charts for example, is a polar coordinate projection of a cartesian barplot. Maps for example can have numerous coordinate systems called map projections. For example; UTM coordinates. -- .pull-right-50[ ```r p+coord_polar("y",start=0) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-40-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] --- name: theme-intro # Theme ![](ggplot_presentation_assets/meme-theme.jpg) --- name: theme ## Theme * Modify non-data plot elements/appearance * Axis labels, panel colors, legend appearance etc * Save a particular appearance for reuse * `?theme` -- .pull-left-50[ ```r ggplot(iris,aes(Petal.Length))+ geom_histogram()+ facet_wrap(~Species,nrow=2)+ theme_grey() ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-41-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ```r ggplot(iris,aes(Petal.Length))+ geom_histogram()+ facet_wrap(~Species,nrow=2)+ theme_bw() ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-42-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? Themes allow to modify all non-data related components of the plot. This is the visual appearance of the plot. Examples include the axes line thickness, the background color or font family. --- name: theme-legend ## Theme • Legend ```r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species)) ``` .pull-left-50[ ```r p + theme(legend.position="top") ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-44-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] .pull-right-50[ ```r p + theme(legend.position="bottom") ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-45-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] --- name: theme-text ## Theme • Text ```r element_text(family=NULL,face=NULL,color=NULL,size=NULL,hjust=NULL, vjust=NULL, angle=NULL,lineheight=NULL,margin = NULL) ``` ```r p <- p + theme( axis.title=element_text(color="#e41a1c"), axis.text=element_text(color="#377eb8"), plot.title=element_text(color="#4daf4a"), plot.subtitle=element_text(color="#984ea3"), legend.text=element_text(color="#ff7f00"), legend.title=element_text(color="#ffff33"), strip.text=element_text(color="#a65628") ) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-49-1.png" width="648" style="display: block; margin: auto auto auto 0;" /> --- name: theme-rect ## Theme • Rect ```r element_rect(fill=NULL,color=NULL,size=NULL,linetype=NULL) ``` ```r p <- p + theme( plot.background=element_rect(fill="#b3e2cd"), panel.background=element_rect(fill="#fdcdac"), panel.border=element_rect(fill=NA,color="#cbd5e8",size=3), legend.background=element_rect(fill="#f4cae4"), legend.box.background=element_rect(fill="#e6f5c9"), strip.background=element_rect(fill="#fff2ae") ) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-53-1.png" width="648" style="display: block; margin: auto auto auto 0;" /> --- name: theme-save ## Theme • Reuse ```r newtheme <- theme_bw() + theme( axis.ticks=element_blank(), panel.background=element_rect(fill="white"), panel.grid.minor=element_blank(), panel.grid.major.x=element_blank(), panel.grid.major.y=element_line(size=0.3,color="grey90"), panel.border=element_blank(), legend.position="top", legend.justification="right" ) ``` .pull-left-50[ ```r p ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-56-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] .pull-right-50[ ```r p + newtheme ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-57-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ] --- name: position ## Position <table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; "> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Murder </th> <th style="text-align:center;"> Assault </th> <th style="text-align:center;"> UrbanPop </th> <th style="text-align:center;"> Rape </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Alabama </td> <td style="text-align:center;"> 13.2 </td> <td style="text-align:center;"> 236 </td> <td style="text-align:center;"> 58 </td> <td style="text-align:center;"> 21.2 </td> </tr> <tr> <td style="text-align:left;"> Alaska </td> <td style="text-align:center;"> 10.0 </td> <td style="text-align:center;"> 263 </td> <td style="text-align:center;"> 48 </td> <td style="text-align:center;"> 44.5 </td> </tr> <tr> <td style="text-align:left;"> Arizona </td> <td style="text-align:center;"> 8.1 </td> <td style="text-align:center;"> 294 </td> <td style="text-align:center;"> 80 </td> <td style="text-align:center;"> 31.0 </td> </tr> </tbody> </table> ```r us <- USArrests %>% mutate(state=rownames(.)) %>% slice(1:4) %>% gather(key=type,value=value,-state) p <- ggplot(us,aes(x=state,y=value,fill=type)) ``` -- .pull-left-50[ ```r p + geom_bar(stat="identity",position="stack") ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-60-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ```r p + geom_bar(stat="identity",position="dodge") ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-61-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] --- name: save ## Saving plots ```r p <- ggplot(iris,aes(Petal.Length,Sepal.Length,color=Species))+ geom_point() ``` * `ggplot2` plots can be saved just like base plots ```r png("plot.png",height=5,width=7,units="cm",res=200) print(p) dev.off() ``` * `ggplot2` package offers a convenient function ```r ggsave("plot.png",p,height=5,width=7,units="cm",dpi=200,type="cairo") ``` * Use `type="cairo"` for nicer anti-aliasing * Note that default units in `png` is pixels while in `ggsave` it's inches --- name: comb ## Combining Plots ```r p <- ggplot(us,aes(x=state,y=value,color=type))+geom_point() q <- ggplot(us,aes(x=state,y=value,fill=type))+geom_bar(stat="identity") ``` ```r patchwork::wrap_plots(p,q) ``` <img src="ggplot_presentation_files/figure-html/unnamed-chunk-67-1.png" width="864" style="display: block; margin: auto auto auto 0;" /> ??? Combining two or more `ggplot2` plots is often required and several packages exist to help with this situation. Some functions allow plots to be placed adjacently, also allowing varying heights or widths of each plot. Some functions allow one plot to be plotted on another plot like a subset plot. Here are alternative options. ```r gridExtra::grid.arrange(p,q,ncol=2) ggpubr::ggarrange(p,q,ncol=2,widths=c(1.5,1),common.legend=T) cowplot::plot_grid() ``` --- name: interactive ## Interactive * Convert `ggplot2` object to interactive HTML ```r p <- ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width,col=Species)) ``` .pull-left-50[ ```r p1 <- p+geom_point() plotly::ggplotly(p1,width=400,height=300) ```
] .pull-right-50[ ```r p2 <- p+ggiraph::geom_point_interactive( aes(tooltip=paste0("<b>Species: </b>",Species)))+ theme_bw(base_size=12) ggiraph::ggiraph(code=print(p2)) ```
] ??? Most interactive plotting libraries are not as complete as `ggplot2`. Therefore, some packages explore ways of converting `ggplot2` objects into interactive graphics --- name: extension class: spaced ## Extensions - [**patchwork**](https://patchwork.data-imaginist.com/): Combining plots - [**ggrepel**](https://ggrepel.slowkow.com/index.html): Text labels including overlap control - [**ggforce**](https://ggforce.data-imaginist.com/): Additional features - [**ggpmisc**](https://github.com/aphalo/ggpmisc): Miscellaneaous features - [**ggthemes**](https://jrnold.github.io/ggthemes/): Set of extra themes - [**ggthemr**](https://github.com/cttobin/ggthemr): More themes - [**ggsci**](https://nanx.me/ggsci/): Color palettes for scales - [**ggmap**](https://github.com/dkahle/ggmap): Dedicated to mapping - [**ggraph**](https://ggraph.data-imaginist.com/): Network graphs - [**ggiraph**](http://davidgohel.github.io/ggiraph/): Converting ggplot2 to interactive graphics A collection of ggplot extension packages: [https://exts.ggplot2.tidyverse.org/](https://exts.ggplot2.tidyverse.org/). --- name: help class: spaced ## Help - [**ggplot2 book**](https://ggplot2-book.org/introduction.html) - [**ggplot2 official reference**](http://ggplot2.tidyverse.org/reference/) - [**The R cookbook**](http://www.cookbook-r.com/Graphs/) - [**RStudio Cheatsheet**](https://www.rstudio.com/resources/cheatsheets/) - [**r-statistics Cheatsheet**](http://r-statistics.co/ggplot2-cheatsheet.html) - [**StackOverflow**](https://stackoverflow.com/) - Blogs, for example; [R-Bloggers](https://www.r-bloggers.com/), [Cedric Scherer](https://www.cedricscherer.com/tags/ggplot2/) etc. <!-- --------------------- Do not edit this and below --------------------- --> --- ![](ggplot_presentation_assets/meme-end-of-pres.jpg) --- name: end-slide class: end-slide, middle count: false # Thank you. Questions? <p>R version 4.1.0 (2021-05-18)<br><p>Platform: x86_64-conda-linux-gnu (64-bit)</p><p>OS: Ubuntu 20.04.4 LTS</p><br> Built on : <i class='fa fa-calendar' aria-hidden='true'></i> 15-Jun-2022 at <i class='fa fa-clock-o' aria-hidden='true'></i> 23:24:21 <b>2022</b> • [SciLifeLab](https://www.scilifelab.se/) • [NBIS](https://nbis.se/) • [RaukR](https://nbisweden.github.io/workshop-RaukR-2206/)