ggplot2
ggplot2
?ggplot2
?Not suitable for:
Why can't we just do everything is base plot? Of course, we could, but it's easier, consistent and more structured using ggplot2
. There is bit of a learning curve, but once the code syntax and graphic building logic is clear, it becomes easy to plot a large variety of graphs.
ggplot2
vs Base Graphicshist(iris$Sepal.Length)
library(ggplot2)ggplot(iris,aes(x=Sepal.Length))+ geom_histogram(bins=8)
For simple graphs, the base plot seem to take minimal coding effort compared to a ggplot graph.
ggplot2
vs Base Graphicsplot(iris$Petal.Length,iris$Petal.Width, col=c("red","green","blue")[iris$Species], pch=c(0,1,2)[iris$Species])legend(x=1,y=2.5, legend=c("setosa","versicolor","virginica"), pch=c(0,1,2),col=c("red","green","blue"))
ggplot(iris,aes(Petal.Length,Sepal.Length,color=Species))+ geom_point()
For anything beyond extremely basic plots, base plotting quickly become complex. More importantly, base plots do not have consistency in it's functions or plotting strategy.
ggplot
was created by Hadley Wickham in 2005 as an implementation of Leland Wilkinson's book Grammar of Graphics.
Different graphs have always been considered as independent entities and also labelled differently such as barplots, scatterplots, boxplots etc. Each graph has it's own function and plotting strategy.
Grammar of graphics (GOG) tries to unify all graphs under a common umbrella. GOG brings the idea that graphs are made up of discrete components which can be mixed and matched to create any plot. This creates a consistent underlying framework to graphing.
ggplot(iris)
ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width))
ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width))+ geom_point()
ggplot(iris,aes(x=Sepal.Length, y=Sepal.Width, colour=Species))+ geom_point()
iris
data.frame
objectSepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
str(iris)
## 'data.frame': 150 obs. of 5 variables:## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
It's a good idea to use str()
to check the input dataframe to make sure that numbers are actually numbers and not characters, for example. Verify that factors are correctly assigned.
diamonds
carat | cut | color | clarity | depth | table | price | x | y | z |
---|---|---|---|---|---|---|---|---|---|
0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |
0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |
0.23 | Good | E | VS1 | 56.9 | 65 | 327 | 4.05 | 4.07 | 2.31 |
0.29 | Premium | I | VS2 | 62.4 | 58 | 334 | 4.20 | 4.23 | 2.63 |
0.31 | Good | J | SI2 | 63.3 | 58 | 335 | 4.34 | 4.35 | 2.75 |
0.24 | Very Good | J | VVS2 | 62.8 | 57 | 336 | 3.94 | 3.96 | 2.48 |
str(diamonds)
## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
R data.frame
is a tabular format with rows and columns just like a spreadsheet. All items in a row or a column must be available or missing values filled in as NAs.
Wide
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
Long
Species | variable | value |
---|---|---|
setosa | Sepal.Length | 5.1 |
setosa | Sepal.Length | 4.9 |
setosa | Sepal.Length | 4.7 |
The data must be cleaned up and prepared for plotting. The data must be 'tidy'. Columns must be variables and rows must be observations. The data can then be in wide or long format depending on the variables to be plotted.
p <- ggplot(iris)# scatterplotp+geom_point(aes(x=Sepal.Length,y=Sepal.Width))# barplotp+geom_bar(aes(x=Sepal.Length))# boxplotp+geom_boxplot(aes(x=Species,y=Sepal.Width))# searchhelp.search("^geom_",package="ggplot2")
Geoms are the geometric components of a graph such as points, lines etc used to represent data. The same data can be visually represented in different geoms. For example, points or bars. Mandatory input requirements change depending on geoms.
x <- ggplot(iris) + geom_bar(aes(x=Sepal.Length),stat="bin")y <- ggplot(iris) + geom_bar(aes(x=Species),stat="count")z <- ggplot(iris) + geom_bar(aes(x=Species,y=Sepal.Length),stat="identity")grid.arrange(x,y,z,nrow=1)
x <- ggplot(iris) + geom_bar(aes(x=Sepal.Length),stat="bin")y <- ggplot(iris) + geom_bar(aes(x=Species),stat="count")z <- ggplot(iris) + geom_bar(aes(x=Species,y=Sepal.Length),stat="identity")grid.arrange(x,y,z,nrow=1)
x <- ggplot(iris) + stat_bin(aes(x=Sepal.Length),geom="bar")y <- ggplot(iris) + stat_count(aes(x=Species),geom="bar")z <- ggplot(iris) + stat_identity(aes(x=Species,y=Sepal.Length),geom="bar")grid.arrange(x,y,z,nrow=1)
plot | stat | geom |
---|---|---|
histogram | bin | bar |
smooth | smooth | line |
boxplot | boxplot | boxplot |
density | density | line |
freqpoly | freqpoly | line |
Use args(geom_bar)
to check arguments.
ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, size=Petal.Length, alpha=Petal.Width, shape=Species, color=Species))
ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width), size=2, alpha=0.8, shape=15, color="steelblue")
Aesthetics are used to assign values to geometries. For example, a set of points can be a fixed size or can be different colors or sizes denoting a variable.
This would be an incorrect way to do it.
ggplot(iris)+geom_point(aes(x=Sepal.Length,y=Sepal.Width,size=2)
x1 <- ggplot(iris) + geom_point(aes(x=Sepal.Length,y=Sepal.Width))+ stat_smooth(aes(x=Sepal.Length,y=Sepal.Width))x2 <- ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_point() + geom_smooth()grid.arrange(x1,x2,nrow=1,ncol=2)
If the same aesthetics are used in multiple geoms, they can be moved to ggplot()
.
ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_point()+ geom_line()+ geom_smooth()+ geom_rug()+ geom_step()+ geom_text(data=subset(iris,iris$Species=="setosa"),aes(label=Species))
Multiple geoms can be plotted one after the other. The order in which items are specified in the command dictates the plotting order on the actual plot.
In this case, the points appear over the lines.
ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_point()+ geom_line()+
while here the lines appear above the points.
ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+ geom_line()+ geom_point()+
Each geom takes input from ggplot()
inputs. If extra input is required to a geom, it can be specified additionally inside aes()
.
data
can be changed if needed for specific geoms.
scale_<aesthetic>_<type>
scale_<aesthetic>_<type>
p <- ggplot(iris)+geom_point(aes(x=Sepal.Length, y=Sepal.Width,color=Species))p
scale_<aesthetic>_<type>
p <- ggplot(iris)+geom_point(aes(x=Sepal.Length, y=Sepal.Width,color=Species))p
p + scale_color_manual( name="Manual", values=c("#5BC0EB","#FDE74C","#9BC53D"))
Scales are used to control the aesthetics. For example the aesthetic color is mapped to a variable x
. The palette of colors used, the mapping of which color to which value, the upper and lower limit of the data and colors etc is controlled by scales.
scale_
, then press TABscale_
, then press TABp <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, shape=Species,color=Petal.Length))p
scale_
, then press TABp <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, shape=Species,color=Petal.Length))p
p +scale_color_gradient(name="Pet Len", breaks=range(iris$Petal.Length), labels=c("Min","Max"), low="black",high="red")
Continuous colours can be changed using scale_color_gradient()
for two colour gradient. Any number of breaks and colours can be specified using scale_color_gradientn()
.
p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, shape=Species,color=Species))p
p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, shape=Species,color=Species))p
p +scale_color_manual(name="New", values=c("blue","green","red"))+scale_shape_manual(name="Bla",values=c(0,1,2))
Shape scale can be adjusted using scale_shape_manual()
. Multiple mappings for the same variable groups legends.
scale_<axis>_<type>
scale_<axis>_<type>
p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width))p
scale_<axis>_<type>
p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width))p
p + scale_color_manual(name="New", values=c("blue","green","red"))+scale_x_continuous(name="Sepal Length", breaks=seq(1,8),limits=c(3,5))
The x and y axes are also controlled by scales. The axis break points, the break point text and limits are controlled through scales.
When setting limits using scale_
, the data outside the limits are dropped. Limits can also be set using lims(x=c(3.5))
or xlim(c(3,5))
. When mapping, coord_map()
or coord_cartesian()
is recommended for setting limits.
facet_wrap
facet_wrap
p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species))p
facet_wrap
p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species))p
p + facet_wrap(~Species)
p + facet_wrap(~Species,nrow=3)
facet_wrap
is used to split a plot into subplots based on the categories in one or more variables.
facet_grid
p <- diamonds %>% ggplot(aes(carat,price))+ geom_point()p + facet_grid(~cut+clarity)
facet_grid
p <- diamonds %>% ggplot(aes(carat,price))+ geom_point()p + facet_grid(~cut+clarity)
p + facet_grid(cut~clarity)
facet_grid
is also used to split a plot into subplots based on the categories in one or more variables. facet_grid
can be used to create a matrix-like grid of two variables.
coord_cartesian(xlim=c(2,8))
for zooming incoord_map
for controlling limits on mapscoord_polar
p <- ggplot(iris,aes(x="",y=Petal.Length,fill=Species))+ geom_bar(stat="identity")p
The coordinate system defines the surface used to represent numbers. Most plots use the cartesian coordinate sytem. Pie charts for example, is a polar coordinate projection of a cartesian barplot. Maps for example can have numerous coordinate systems called map projections. For example; UTM coordinates.
coord_cartesian(xlim=c(2,8))
for zooming incoord_map
for controlling limits on mapscoord_polar
p <- ggplot(iris,aes(x="",y=Petal.Length,fill=Species))+ geom_bar(stat="identity")p
p+coord_polar("y",start=0)
The coordinate system defines the surface used to represent numbers. Most plots use the cartesian coordinate sytem. Pie charts for example, is a polar coordinate projection of a cartesian barplot. Maps for example can have numerous coordinate systems called map projections. For example; UTM coordinates.
?theme
?theme
ggplot(iris,aes(Petal.Length))+ geom_histogram()+ facet_wrap(~Species,nrow=2)+ theme_grey()
?theme
ggplot(iris,aes(Petal.Length))+ geom_histogram()+ facet_wrap(~Species,nrow=2)+ theme_grey()
ggplot(iris,aes(Petal.Length))+ geom_histogram()+ facet_wrap(~Species,nrow=2)+ theme_bw()
Themes allow to modify all non-data related components of the plot. This is the visual appearance of the plot. Examples include the axes line thickness, the background color or font family.
p <- ggplot(iris)+ geom_point(aes(x=Sepal.Length, y=Sepal.Width, color=Species))
p + theme(legend.position="top")
p + theme(legend.position="bottom")
element_text(family=NULL,face=NULL,color=NULL,size=NULL,hjust=NULL, vjust=NULL, angle=NULL,lineheight=NULL,margin = NULL)
p <- p + theme( axis.title=element_text(color="#e41a1c"), axis.text=element_text(color="#377eb8"), plot.title=element_text(color="#4daf4a"), plot.subtitle=element_text(color="#984ea3"), legend.text=element_text(color="#ff7f00"), legend.title=element_text(color="#ffff33"), strip.text=element_text(color="#a65628"))
element_rect(fill=NULL,color=NULL,size=NULL,linetype=NULL)
p <- p + theme( plot.background=element_rect(fill="#b3e2cd"), panel.background=element_rect(fill="#fdcdac"), panel.border=element_rect(fill=NA,color="#cbd5e8",size=3), legend.background=element_rect(fill="#f4cae4"), legend.box.background=element_rect(fill="#e6f5c9"), strip.background=element_rect(fill="#fff2ae"))
newtheme <- theme_bw() + theme( axis.ticks=element_blank(), panel.background=element_rect(fill="white"), panel.grid.minor=element_blank(), panel.grid.major.x=element_blank(), panel.grid.major.y=element_line(size=0.3,color="grey90"), panel.border=element_blank(), legend.position="top", legend.justification="right")
p
p + newtheme
Murder | Assault | UrbanPop | Rape | |
---|---|---|---|---|
Alabama | 13.2 | 236 | 58 | 21.2 |
Alaska | 10.0 | 263 | 48 | 44.5 |
Arizona | 8.1 | 294 | 80 | 31.0 |
us <- USArrests %>% mutate(state=rownames(.)) %>% slice(1:4) %>% gather(key=type,value=value,-state)p <- ggplot(us,aes(x=state,y=value,fill=type))
Murder | Assault | UrbanPop | Rape | |
---|---|---|---|---|
Alabama | 13.2 | 236 | 58 | 21.2 |
Alaska | 10.0 | 263 | 48 | 44.5 |
Arizona | 8.1 | 294 | 80 | 31.0 |
us <- USArrests %>% mutate(state=rownames(.)) %>% slice(1:4) %>% gather(key=type,value=value,-state)p <- ggplot(us,aes(x=state,y=value,fill=type))
p + geom_bar(stat="identity",position="stack")
Murder | Assault | UrbanPop | Rape | |
---|---|---|---|---|
Alabama | 13.2 | 236 | 58 | 21.2 |
Alaska | 10.0 | 263 | 48 | 44.5 |
Arizona | 8.1 | 294 | 80 | 31.0 |
us <- USArrests %>% mutate(state=rownames(.)) %>% slice(1:4) %>% gather(key=type,value=value,-state)p <- ggplot(us,aes(x=state,y=value,fill=type))
p + geom_bar(stat="identity",position="stack")
p + geom_bar(stat="identity",position="dodge")
p <- ggplot(iris,aes(Petal.Length,Sepal.Length,color=Species))+ geom_point()
ggplot2
plots can be saved just like base plots png("plot.png",height=5,width=7,units="cm",res=200)print(p)dev.off()
ggplot2
package offers a convenient function ggsave("plot.png",p,height=5,width=7,units="cm",dpi=200,type="cairo")
type="cairo"
for nicer anti-aliasing png
is pixels while in ggsave
it's inches p <- ggplot(us,aes(x=state,y=value,color=type))+geom_point()q <- ggplot(us,aes(x=state,y=value,fill=type))+geom_bar(stat="identity")
gridExtra::grid.arrange(p,q,ncol=2)
p <- ggplot(us,aes(x=state,y=value,color=type))+geom_point()q <- ggplot(us,aes(x=state,y=value,fill=type))+geom_bar(stat="identity")
gridExtra::grid.arrange(p,q,ncol=2)
p <- ggplot(us,aes(x=state,y=value,color=type))+geom_point()q <- ggplot(us,aes(x=state,y=value,fill=type))+geom_bar(stat="identity")
gridExtra::grid.arrange(p,q,ncol=2)
cowplot::plot_grid()
Combining two or more ggplot2
plots is often required and several packages exist to help with this situation. Some functions allow plots to be placed adjacently, also allowing varying heights or widths of each plot. Some functions allow one plot to be plotted on another plot like a subset plot.
ggplot2
object to interactive HTMLp <- ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width,col=Species))
p2 <- p+ggiraph::geom_point_interactive( aes(tooltip=paste0("<b>Species: </b>",Species)), size=2)+theme_bw(base_size=12)ggiraph::ggiraph(code=print(p2))
Most interactive plotting libraries are not as complete as ggplot2
. Therefore, some packages explore ways of converting ggplot2
objects into interactive graphics
A collection of ggplot extension packages: https://exts.ggplot2.tidyverse.org/.
R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
OS: Ubuntu 20.04.2 LTS
Built on : 14-Jun-2021 at 12:58:36
2019 • SciLifeLab • NBIS
ggplot2
?Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |