class: center, middle, inverse, title-slide .title[ # Visualisation with
ggplot2
] .subtitle[ ## R Foundations for Life Scientists ] .author[ ### Nima Rafati, Roy Francis ] --- exclude: true count: false <link href="https://fonts.googleapis.com/css?family=Roboto|Source+Sans+Pro:300,400,600|Ubuntu+Mono&subset=latin-ext" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous"> <!-- ----------------- Only edit title & author above this ----------------- --> # Contents * [Why `ggplot2`?](#intro) * [Grammar of Graphics](#gog) * [Data](#data-iris) * [Geoms](#geom) * [Aesthetics](#aes) * [Scales](#scales-discrete-colour) * [Facets](#facet-wrap) * [Coordinates](#coordinate) * [Theme](#theme) * [Position](#position) * [Saving Plots](#save) * [Combining Plots](#comb) * [Interactive Plots](#interactive) * [Extensions](#extension) --- name: intro class: spaced # Why `ggplot2`? * Consistent code * Flexible * Automatic legends, colors etc * Save plot objects * Themes for reusing styles * Numerous add-ons/extensions * Nearly complete graphing solution -- Not suitable for: * 3D graphics ??? Why can't we just do everything is base plot? Of course, we could, but it's easier, consistent and more structured using `ggplot2`. There is bit of a learning curve, but once the code syntax and graphic building logic is clear, it becomes easy to plot a large variety of graphs. --- --- name: gog class: spaced # Grammar Of Graphics .pull-left-30[   ] -- .pull-right-70[ * **Data**: Input data * **Aesthetic**: Visual characteristics of the geometry. Size, Color, Shape etc * **Geom**: A geometry representing data. Points, Lines etc * **Scale**: How visual characteristics are converted to display values * **Statistics**: Statistical transformations. Counts, Means etc * **Coordinates**: Numeric system to determine position of geometry. Cartesian, Polar etc * **Facets**: Split data into subsets ] ??? `ggplot` was created by Hadley Wickham in 2005 as an implementation of Leland Wilkinson's book Grammar of Graphics. Different graphs have always been considered as independent entities and also labelled differently such as barplots, scatterplots, boxplots etc. Each graph has it's own function and plotting strategy. Grammar of graphics (GOG) tries to unify all graphs under a common umbrella. GOG brings the idea that graphs are made up of discrete components which can be mixed and matched to create any plot. This creates a consistent underlying framework to graphing. **Note:** aesthetic is sometimes also referred to as mappings and can be passed to main `ggplot()` or in individual geoms. But defining aesthetics in the main `ggplot()` call allows all subsequent layers to inherit these mappings automatically. --- name: syntax # Building A Graph: Syntax   --- name: gvb1 # `ggplot2` vs Base Graphics .pull-left-50[ ``` r hist(iris$Sepal.Length) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-3-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] .pull-right-50[ ``` r library(ggplot2) ggplot(data = iris, aes(x=Sepal.Length)) + geom_histogram(bins=8) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-4-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] ??? For simple graphs, the base plot seem to take minimal coding effort compared to a ggplot graph. --- name: gvb2 # `ggplot2` vs Base Graphics .pull-left-50[ ``` r # Plot plot(x = iris$Petal.Length, y = iris$Petal.Width, col=c("red","green","blue")[iris$Species], pch=c(0,1,2)[iris$Species]) # Legend legend(x=1,y=2.5, legend=c("setosa","versicolor","virginica"), pch=c(0,1,2),col=c("red","green","blue")) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-5-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] .pull-right-50[ ``` r ggplot(data = iris, aes(x = Petal.Length, y = Petal.Width, color=Species))+ geom_point( aes(shape = Species)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-6-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? For anything beyond extremely basic plots, base plotting quickly become complex. More importantly, base plots do not have consistency in it's functions or plotting strategy. --- name: build-1 # Building A Graph .pull-left-40[ ``` r ggplot(iris) ``` ] .pull-right-50[ <img src="slide_ggplot2_files/figure-html/unnamed-chunk-8-1.png" width="252" style="display: block; margin: auto auto auto 0;" /> ] --- name: build-2 # Building A Graph .pull-left-40[ ``` r ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width)) ``` ] .pull-right-60[ <img src="slide_ggplot2_files/figure-html/unnamed-chunk-10-1.png" width="252" style="display: block; margin: auto auto auto 0;" /> ] --- name: build-3 # Building A Graph .pull-left-40[ ``` r ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width))+ geom_point() ``` ] .pull-right-80[ <img src="slide_ggplot2_files/figure-html/unnamed-chunk-12-1.png" width="252" style="display: block; margin: auto auto auto 0;" /> ] --- name: build-4 # Building A Graph .pull-left-40[ ``` r ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width, colour=Species))+ geom_point() ``` ] .pull-right-60[ <img src="slide_ggplot2_files/figure-html/unnamed-chunk-14-1.png" width="252" style="display: block; margin: auto auto auto 0;" /> ] --- name: geom # Geoms  ``` r help.search("^geom_",package="ggplot2") ``` ??? Geoms are the geometric components of a graph such as points, lines etc used to represent data. The same data can be visually represented in different geoms. For example, points or bars. Mandatory input requirements change depending on geoms. --- name: geom1 # Geoms ``` r p <- ggplot(iris) ``` --- name: geom2 # Geoms ``` r p <- ggplot(iris) # scatterplot p+geom_point(aes(x=Sepal.Length,y=Sepal.Width)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-17-1.png" width="504" style="display: block; margin: auto auto auto 0;" /> --- name: geom3 # Geoms ``` r p <- ggplot(iris) # barplot p+geom_bar(aes(x=Sepal.Length)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-18-1.png" width="504" style="display: block; margin: auto auto auto 0;" /> --- name: geom4 # Geoms ``` r p <- ggplot(iris) # boxplot p+geom_boxplot(aes(x=Species,y=Sepal.Width)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-19-1.png" width="504" style="display: block; margin: auto auto auto 0;" /> --- name: aes # Aesthetics * Aesthetic mapping vs aesthetic parameter .pull-left-50[ ``` r ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, size=Petal.Length, alpha=Petal.Width, shape=Species, color=Species)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-20-1.png" width="396" style="display: block; margin: auto auto auto 0;" /> ] .pull-left-50[ ``` r ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width), size=2, alpha=0.8, shape=15, color="steelblue") ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-21-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] ??? Aesthetics are used to assign values to geometries. For example, a set of points can be a fixed size or can be different colors or sizes denoting a variable. This would be an incorrect way to do it. ``` ggplot(iris)+ geom_point(aes(x=Sepal.Length,y=Sepal.Width,size=2) ``` --- name: aes-2 # Aesthetics ``` r x1 <- ggplot(iris) + geom_point(aes(x=Sepal.Length, y=Sepal.Width))+ stat_smooth(aes(x=Sepal.Length, y=Sepal.Width)) x2 <- ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width))+ geom_point() + geom_smooth(level = 0.5) x1|x2 ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-22-1.png" width="576" style="display: block; margin: auto auto auto 0;" /> ??? If the same aesthetics are used in multiple geoms, they can be moved to `ggplot()`. --- name: multiple-geom # Multiple Geoms ``` r ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_point()+ geom_line()+ geom_smooth()+ geom_rug()+ geom_step()+ geom_text(data=subset(iris,iris$Species=="setosa"),aes(label=Species)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-23-1.png" width="648" style="display: block; margin: auto auto auto 0;" /> ??? Multiple geoms can be plotted one after the other. The order in which items are specified in the command dictates the plotting order on the actual plot. In this case, the points appear over the lines. ``` ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_point()+ geom_line()+ ``` while here the lines appear above the points. ``` ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_line()+ geom_point()+ ``` Each geom takes input from `ggplot()` inputs. If extra input is required to a geom, it can be specified additionally inside `aes()`. `data` can be changed if needed for specific geoms. --- name: scales-overview # Scales • Overview * Control mapping of data to aesthetics * Scales: position, color, fill, size, shape, alpha, linetype * syntax: **`scale_<aesthetic>_<type>`** <div style="font-size:80%"> | **Aesthetic (What it controls)** | **Scale Type (How values are mapped)** | **Common Function Examples** | **When / Why to Use** | | -------------------------------- | -------------------------------------------------------------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | 🎨 **Color / Fill** | `continuous`, `discrete`, `manual`, `identity` |`scale_color_continuous()`<br>`scale_fill_manual()`<br>`scale_color_brewer()` | To control color mapping — e.g. use `manual` for custom palettes or `brewer` for ready-made color sets | | ⚪ **Size** | `continuous`, `discrete`, `manual` | `scale_size_continuous()`<br>`scale_size_manual()` | Adjust how numeric or categorical values affect point sizes | | 🔷 **Shape** | `discrete`, `manual` | `scale_shape_discrete()`<br>`scale_shape_manual()` | Map categories to different point shapes | | ➖ **Linetype** | `identity`, `manual`, `discrete` | `scale_linetype_manual()` | Customize line styles for groups in line plots | | 📊 **X / Y axes** | `continuous`, `discrete`, `log`, `reverse`, `date`, `datetime` | `scale_x_log10()`<br>`scale_y_reverse()`<br>`scale_x_date()` | Transform axis scales (e.g. log scale, reverse direction, or show dates) | --- name: scales-discrete-color # Scales • Discrete Colors * scales: position, color, fill, size, shape, alpha, linetype * syntax: **`scale_<aesthetic>_<type>`** <br> ``` r p + scale_fill_manual( name = "My legend", # legend title labels=c("Min","Max"), # legend labels breaks = c(min(data$x), max(data$x)), # legend breaks values = c("skyblue", "navy"), # colors to use ) ``` -- .pull-left-50[ ``` r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species)) p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-25-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r p <- p + scale_color_manual( name="Manual", values= c("#5BC0EB","#FDE74C","#9BC53D")) p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-26-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] ??? Scales are used to control the aesthetics. For example the aesthetic color is mapped to a variable `x`. The palette of colors used, the mapping of which color to which value, the upper and lower limit of the data and colors etc is controlled by scales. --- name: scales-continuous-color # Scales • Continuous Colors * In RStudio, type `scale_`, then press **TAB** -- .pull-left-50[ ``` r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, shape=Species, color=Petal.Length)) p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-27-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r # p + scale_color_gradient(name="Pet Len", breaks=range(iris$Petal.Length), labels=c("Min","Max"), low="black",high="red") ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-28-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? Continuous colours can be changed using `scale_color_gradient()` for two colour gradient. Any number of breaks and colours can be specified using `scale_color_gradientn()`. --- name: scales-shape # Scales • Shape .pull-left-50[ ``` r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, shape=Species, color=Species)) p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-29-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r # p + scale_color_manual(name="New", values=c("blue","green","red"))+ scale_shape_manual(name="Bla", values=c(0,1,2)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-30-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? Shape scale can be adjusted using `scale_shape_manual()`. Multiple mappings for the same variable groups legends. --- name: scales-axis # Scales • Axes * scales: x, y * syntax: `scale_<axis>_<type>` * arguments: name, limits, breaks, labels -- .pull-left-50[ ``` r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width)) p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-31-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r p + scale_color_manual(name="New", values=c("blue","green","red"))+ scale_x_continuous(name="Sepal Length", breaks=seq(1,8),limits=c(3,5)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-32-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] ??? The x and y axes are also controlled by scales. The axis break points, the break point text and limits are controlled through scales. When setting limits using `scale_`, the data outside the limits are dropped. Limits can also be set using `lims(x=c(3.5))` or `xlim(c(3,5))`. When mapping, `coord_map()` or `coord_cartesian()` is recommended for setting limits. --- name: exercise # Exercise Exercise 1-2.10 **Axes modification**. --- name: facet-wrap # Facets • `facet_wrap` * Split to subplots based on variable(s) * Facetting in one dimension -- .pull-left-50[ ``` r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species)) p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-33-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r p + facet_wrap(~Species) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-34-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ``` r p + facet_wrap(~Species,nrow=3) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-35-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ] ??? `facet_wrap` is used to split a plot into subplots based on the categories in one or more variables. --- name: facet-grid # Facets • `facet_grid` * Facetting in two dimensions .pull-left-50[ ``` r p <- diamonds %>% ggplot(aes(carat,price))+ geom_point() p + facet_grid(~cut+clarity) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-36-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-left-50[ ``` r p + facet_grid(cut~clarity) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-37-1.png" width="374.4" style="display: block; margin: auto auto auto 0;" /> ] ??? `facet_grid` is also used to split a plot into subplots based on the categories in one or more variables. `facet_grid` can be used to create a matrix-like grid of two variables. --- name: coordinate # Coordinate Systems | **Function** | **What it does** | **Common use / Example** | | ------------------- | --------------------------------------------------- | --------------------------------------------------------- | | `coord_cartesian()` | Adjusts the visible region without removing data | `coord_cartesian(xlim = c(2, 8))` → zoom into x-range 2–8 | | `coord_fixed()` | Keeps equal aspect ratio for x and y | Useful for maps or shapes where scale should be equal | | `coord_flip()` | Swaps x and y axes | Great for horizontal bar plots | | `coord_trans()` | Transforms axes (e.g., log, sqrt) | `coord_trans(y = "log10")` | | `coord_polar()` | Converts to polar coordinates | Create pie charts or circular plots | | `coord_map()` | Uses map projection (scales adjusted for geography) | Plot geographic data with correct proportions | --- name: coordinate-polar # Coordinate Systems • Polar .pull-left-50[ ``` r p <- ggplot(iris,aes(x="",y=Petal.Length,fill=Species))+ geom_bar(stat="identity") p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-38-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r p+coord_polar("y",start=0) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-39-1.png" width="316.8" style="display: block; margin: auto auto auto 0;" /> ] ??? The coordinate system defines the surface used to represent numbers. Most plots use the cartesian coordinate sytem. Pie charts for example, is a polar coordinate projection of a cartesian barplot. Maps for example can have numerous coordinate systems called map projections. For example; UTM coordinates. <div font-size = '80%'> | **Stat name** | **Used with** | **What it does** | | ------------- | --------------------------------- | ---------------------------------------- | | `"count"` | `geom_bar()` | Counts rows per x value | | `"identity"` | any geom | Plots given y-values directly | | `"summary"` | `geom_bar()`, `geom_pointrange()` | Computes summaries like mean, median | | `"bin"` | `geom_histogram()` | Groups numeric data into bins | | `"density"` | `geom_density()` | Computes kernel density estimate | | `"smooth"` | `geom_smooth()` | Fits a smoothing line (e.g., regression) | --- name: theme # Theme * Modify non-data plot elements/appearance * Axis labels, panel colors, legend appearance etc * Save a particular appearance for reuse * `?theme` * Two build-in themes: `theme_grey()`, `theme_bw()` -- .pull-left-50[ ``` r ggplot(iris,aes(Petal.Length))+ geom_histogram()+ facet_wrap(~Species,nrow=2)+ theme_grey() ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-40-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r ggplot(iris,aes(Petal.Length))+ geom_histogram()+ facet_wrap(~Species,nrow=2)+ theme_bw() ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-41-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> ] ??? Themes allow to modify all non-data related components of the plot. This is the visual appearance of the plot. Examples include the axes line thickness, the background color or font family. --- name: theme-legend # Theme • Legend ``` r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species)) ``` .pull-left-50[ ``` r p + theme(legend.position="top") ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-43-1.png" width="309.6" style="display: block; margin: auto auto auto 0;" /> ] .pull-right-50[ ``` r p + theme(legend.position="bottom") ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-44-1.png" width="309.6" style="display: block; margin: auto auto auto 0;" /> ] --- name: theme-title # Theme • Title Default location of title is left aligned. But you can centre it or right align it by `theme`. `hjust`, horizontal justification, value ranges from 0 (left) to 1 (right). ``` r p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species))+ labs(title="My Title",subtitle="My Subtitle") p + theme(plot.title=element_text(hjust=0.5), plot.subtitle=element_text(hjust=1.0)) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-45-1.png" width="504" style="display: block; margin: auto auto auto 0;" /> --- name: theme-text # Theme • Text ``` r element_text(family=NULL,face=NULL,color=NULL,size=NULL,hjust=NULL, vjust=NULL, angle=NULL,lineheight=NULL,margin = NULL) ``` ``` r p <- p + theme( axis.title=element_text(color="#e41a1c"), axis.text=element_text(color="#377eb8"), plot.title=element_text(color="#4daf4a"), plot.subtitle=element_text(color="#984ea3"), legend.text=element_text(color="#ff7f00"), legend.title=element_text(color="#ffff33"), strip.text=element_text(color="#a65628") ) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-49-1.png" width="720" style="display: block; margin: auto auto auto 0;" /> --- name: theme-rect # Theme • Rect ``` r element_rect(fill=NULL,color=NULL,size=NULL,linetype=NULL) ``` ``` r p <- p + theme( plot.background=element_rect(fill="#b3e2cd"), panel.background=element_rect(fill="#fdcdac"), panel.border=element_rect(fill=NA,color="#cbd5e8",size=3), legend.background=element_rect(fill="#f4cae4"), legend.box.background=element_rect(fill="#e6f5c9"), strip.background=element_rect(fill="#fff2ae") ) ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-53-1.png" width="720" style="display: block; margin: auto auto auto 0;" /> --- name: theme-save # Theme • Reuse ``` r newtheme <- theme_bw() + theme( axis.ticks=element_blank(), panel.background=element_rect(fill="white"), panel.grid.minor=element_blank(), panel.grid.major.x=element_blank(), panel.grid.major.y=element_line(size=0.3,color="grey90"), panel.border=element_blank(), legend.position="top", legend.justification="right" ) ``` .pull-left-50[ ``` r p ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-56-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> ] .pull-right-50[ ``` r p + newtheme ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-57-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ] --- name: position # Position ``` ## Murder Assault UrbanPop Rape ## Alabama 13.2 236 58 21.2 ## Alaska 10.0 263 48 44.5 ## Arizona 8.1 294 80 31.0 ``` ``` r us <- USArrests %>% mutate(state=rownames(.)) %>% slice(1:4) %>% gather(key=type,value=value,-state) p <- ggplot(us,aes(x=state,y=value,fill=type)) ``` -- .pull-left-50[ ``` r p + geom_bar(stat="identity",position="stack") ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-60-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ``` r p + geom_bar(stat="identity",position="dodge") ``` <img src="slide_ggplot2_files/figure-html/unnamed-chunk-61-1.png" width="324" style="display: block; margin: auto auto auto 0;" /> ] ??? | **Position** | **Description** | **Typical Use** | | --------------------- | ------------------------------------------ | ----------------------------------------------------- | | `"stack"` *(default)* | Stacks bars on top of each other | Show totals and part-to-whole relationships | | `"dodge"` | Places bars side-by-side | Compare groups within the same category | | `"fill"` | Stacks bars to equal height (100%) | Show proportions within each category | --- name: save # Saving plots ``` r p <- ggplot(iris,aes(Petal.Length,Sepal.Length,color=Species))+ geom_point() ``` * `ggplot2` plots can be saved just like base plots ``` r png("plot.png",height=5,width=7,units="cm",res=200) print(p) dev.off() ``` * `ggplot2` package offers a convenient function ``` r ggsave("plot.png",p,height=5,width=7,units="cm",dpi=200,type="cairo") ``` * Use `type="cairo"` for nicer anti-aliasing * Note that default units in `png` is pixels while in `ggsave` it's inches --- name: data-iris # Data • `iris` * Input data is always an R `data.frame` object <table> <thead> <tr> <th style="text-align:center;"> Sepal.Length </th> <th style="text-align:center;"> Sepal.Width </th> <th style="text-align:center;"> Petal.Length </th> <th style="text-align:center;"> Petal.Width </th> <th style="text-align:center;"> Species </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 5.1 </td> <td style="text-align:center;"> 3.5 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.9 </td> <td style="text-align:center;"> 3.0 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.7 </td> <td style="text-align:center;"> 3.2 </td> <td style="text-align:center;"> 1.3 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> </tbody> </table> ``` r str(iris) ``` ``` ## 'data.frame': 150 obs. of 5 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... ``` ??? It's a good idea to use `str()` to check the input dataframe to make sure that numbers are actually numbers and not characters, for example. Verify that factors are correctly assigned. --- name: data-diamonds # Data • `diamonds` <table> <thead> <tr> <th style="text-align:center;"> carat </th> <th style="text-align:center;"> cut </th> <th style="text-align:center;"> color </th> <th style="text-align:center;"> clarity </th> <th style="text-align:center;"> depth </th> <th style="text-align:center;"> table </th> <th style="text-align:center;"> price </th> <th style="text-align:center;"> x </th> <th style="text-align:center;"> y </th> <th style="text-align:center;"> z </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 0.23 </td> <td style="text-align:center;"> Ideal </td> <td style="text-align:center;"> E </td> <td style="text-align:center;"> SI2 </td> <td style="text-align:center;"> 61.5 </td> <td style="text-align:center;"> 55 </td> <td style="text-align:center;"> 326 </td> <td style="text-align:center;"> 3.95 </td> <td style="text-align:center;"> 3.98 </td> <td style="text-align:center;"> 2.43 </td> </tr> <tr> <td style="text-align:center;"> 0.21 </td> <td style="text-align:center;"> Premium </td> <td style="text-align:center;"> E </td> <td style="text-align:center;"> SI1 </td> <td style="text-align:center;"> 59.8 </td> <td style="text-align:center;"> 61 </td> <td style="text-align:center;"> 326 </td> <td style="text-align:center;"> 3.89 </td> <td style="text-align:center;"> 3.84 </td> <td style="text-align:center;"> 2.31 </td> </tr> <tr> <td style="text-align:center;"> 0.23 </td> <td style="text-align:center;"> Good </td> <td style="text-align:center;"> E </td> <td style="text-align:center;"> VS1 </td> <td style="text-align:center;"> 56.9 </td> <td style="text-align:center;"> 65 </td> <td style="text-align:center;"> 327 </td> <td style="text-align:center;"> 4.05 </td> <td style="text-align:center;"> 4.07 </td> <td style="text-align:center;"> 2.31 </td> </tr> </tbody> </table> ``` r str(diamonds) ``` ``` ## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame) ## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ... ## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ... ## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ... ## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ... ## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ... ## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ... ## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ... ## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ... ## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ... ## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ... ``` ??? R `data.frame` is a tabular format with rows and columns just like a spreadsheet. All items in a row or a column must be available or missing values filled in as NAs. --- name: data-format # Data • Format -- - Wide format <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;font-weight: bold;color: blue !important;"> </th> <th style="text-align:right;font-weight: bold;color: blue !important;"> Sample_1 </th> <th style="text-align:right;font-weight: bold;color: blue !important;"> Sample_2 </th> <th style="text-align:right;font-weight: bold;color: blue !important;"> Sample_3 </th> <th style="text-align:right;font-weight: bold;color: blue !important;"> Sample_4 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: orange !important;color: red !important;"> ENSG00000000003 </td> <td style="text-align:right;color: orange !important;"> 321 </td> <td style="text-align:right;color: orange !important;"> 303 </td> <td style="text-align:right;color: orange !important;"> 204 </td> <td style="text-align:right;color: orange !important;"> 492 </td> </tr> <tr> <td style="text-align:left;color: orange !important;color: red !important;"> ENSG00000000005 </td> <td style="text-align:right;color: orange !important;"> 0 </td> <td style="text-align:right;color: orange !important;"> 0 </td> <td style="text-align:right;color: orange !important;"> 0 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> <tr> <td style="text-align:left;color: orange !important;color: red !important;"> ENSG00000000419 </td> <td style="text-align:right;color: orange !important;"> 696 </td> <td style="text-align:right;color: orange !important;"> 660 </td> <td style="text-align:right;color: orange !important;"> 472 </td> <td style="text-align:right;color: orange !important;"> 951 </td> </tr> <tr> <td style="text-align:left;color: orange !important;color: red !important;"> ENSG00000000457 </td> <td style="text-align:right;color: orange !important;"> 59 </td> <td style="text-align:right;color: orange !important;"> 54 </td> <td style="text-align:right;color: orange !important;"> 44 </td> <td style="text-align:right;color: orange !important;"> 109 </td> </tr> <tr> <td style="text-align:left;color: orange !important;color: red !important;"> ENSG00000000460 </td> <td style="text-align:right;color: orange !important;"> 399 </td> <td style="text-align:right;color: orange !important;"> 405 </td> <td style="text-align:right;color: orange !important;"> 236 </td> <td style="text-align:right;color: orange !important;"> 445 </td> </tr> <tr> <td style="text-align:left;color: orange !important;color: red !important;"> ENSG00000000938 </td> <td style="text-align:right;color: orange !important;"> 0 </td> <td style="text-align:right;color: orange !important;"> 0 </td> <td style="text-align:right;color: orange !important;"> 0 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> </tbody> </table> -- * familiarity * conveniency * you see more data --- name: data-format-2 # Data • Format - Long format -- <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Sample_ID </th> <th style="text-align:left;"> Gene </th> <th style="text-align:right;"> count </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: red !important;"> ENSG00000000003 </td> <td style="text-align:right;color: orange !important;"> 321 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: red !important;"> ENSG00000000005 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: red !important;"> ENSG00000000419 </td> <td style="text-align:right;color: orange !important;"> 696 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: red !important;"> ENSG00000000457 </td> <td style="text-align:right;color: orange !important;"> 59 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: red !important;"> ENSG00000000460 </td> <td style="text-align:right;color: orange !important;"> 399 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: red !important;"> ENSG00000000938 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> </tbody> </table> -- <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Sample_ID </th> <th style="text-align:left;"> Sample_Name </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Replicate </th> <th style="text-align:left;"> Cell </th> <th style="text-align:left;"> Gene </th> <th style="text-align:right;"> count </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000003 </td> <td style="text-align:right;color: orange !important;"> 321 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000005 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000419 </td> <td style="text-align:right;color: orange !important;"> 696 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000457 </td> <td style="text-align:right;color: orange !important;"> 59 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000460 </td> <td style="text-align:right;color: orange !important;"> 399 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000938 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> </tbody> </table> --- name: data-format-3 # Data • Format - Long format <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Sample_ID </th> <th style="text-align:left;"> Sample_Name </th> <th style="text-align:left;"> Time </th> <th style="text-align:left;"> Replicate </th> <th style="text-align:left;"> Cell </th> <th style="text-align:left;"> Gene </th> <th style="text-align:right;"> count </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000003 </td> <td style="text-align:right;color: orange !important;"> 321 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000005 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000419 </td> <td style="text-align:right;color: orange !important;"> 696 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000457 </td> <td style="text-align:right;color: orange !important;"> 59 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000460 </td> <td style="text-align:right;color: orange !important;"> 399 </td> </tr> <tr> <td style="text-align:left;color: blue !important;"> Sample_1 </td> <td style="text-align:left;color: blue !important;"> t0_A </td> <td style="text-align:left;color: blue !important;"> t0 </td> <td style="text-align:left;color: blue !important;"> A </td> <td style="text-align:left;color: blue !important;"> A431 </td> <td style="text-align:left;color: red !important;"> ENSG00000000938 </td> <td style="text-align:right;color: orange !important;"> 0 </td> </tr> </tbody> </table> -- * easier to add data to the existing * Most databases store and maintain in long-formats due to its efficiency * R tools **like ggplot** require data in long format. * Functions available to change between data-formats * `melt()` from **reshape2** * `gather()` from **tidyverse** --- name: extension class: spaced # Extensions * [**gridExtra**](https://cran.r-project.org/web/packages/gridExtra/index.html): Extends grid graphics functionality * [**ggpubr**](http://www.sthda.com/english/rpkgs/ggpubr/): Useful functions to prepare plots for publication * [**cowplot**](https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html): Combining plots * [**ggthemes**](https://cran.r-project.org/web/packages/ggthemes/vignettes/ggthemes.html): Set of extra themes * [**ggthemr**](https://github.com/cttobin/ggthemr): More themes * [**ggsci**](https://cran.r-project.org/web/packages/ggsci/vignettes/ggsci.html): Color palettes for scales * [**ggrepel**](https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html): Advanced text labels including overlap control * [**ggmap**](https://github.com/dkahle/ggmap): Dedicated to mapping * [**ggraph**](https://github.com/thomasp85/ggraph): Network graphs * [**ggiraph**](http://davidgohel.github.io/ggiraph/): Converting ggplot2 to interactive graphics * [**Shortlisted ggplot2 extension by category**](https://mtbioinformatics.wordpress.com/blog-2/) * [**Offical ggplot2 extensions**](https://ggplot2.tidyverse.org/extensions/) --- name: help class: spaced # Help * [**ggplot2 official reference**](http://ggplot2.tidyverse.org/reference/) * [**The R cookbook**](http://www.cookbook-r.com/) * [**StackOverflow**](https://stackoverflow.com/) * [**RStudio Cheatsheet**](https://www.rstudio.com/resources/cheatsheets/) * [**r-statistics Cheatsheet**](http://r-statistics.co/ggplot2-cheatsheet.html) * [**ggplot2 GUI**](https://site.shinyserver.dck.gmw.rug.nl/ggplotgui/) * Numerous personal blogs, r-bloggers.com etc. <!-- --------------------- Do not edit this and below --------------------- --> --- name: end-slide class: end-slide, middle count: false # Thank you. Questions? .end-text[ <p class="smaller"> <span class="small" style="line-height: 1.2;">Graphics from </span><img src="./assets/freepik.jpg" style="max-height:20px; vertical-align:middle;"><br> Created: 27-Oct-2025 • <a href="https://www.scilifelab.se/">SciLifeLab</a> • <a href="https://nbis.se/">NBIS</a> </p> ]