In this lab, we will go step-by-step through manually building a scientific plot using base graphics in R.
First, we will produce some random data that we will later plot. Make a data frame with
First, look at the defaults:
#20 random datapoints
x <- sample(c(1:25), size=20, replace=T)
y <- rnorm(n=20, mean=0, sd=1) # sample from normal
r <- rnorm(n=20, mean=0, sd=1) # radius from normal
names <- paste("ind", 1:20, sep="") # assign some names
data <- data.frame(cbind(X=x,Y=y, R=r), row.names=names)
plot(data[,1:2])
As you see, the points are displayed in a simple way, axes are set automatically, the radius is not reflected on the plot in any way (3rd dimension).
Build the plot from scratch, begin by displaying no points. You can do this by setting type = 'n'
.
type
s:
plot(data[,1:2], type='n')
Remove the default box around the plot and axes.
plot(data[,1:2], type='n',xaxt='n', yaxt='n',xlab="", ylab="", frame.plot=F)
Create X and Y axis so that they cover the whole range of x and y. For the Y axis, set 10 equidistant tickmarks and set labels to their values rounded to two decimals. Turn the labels, so that they are parallel to the OX axis.
#Create X axis
coords.x <- seq(min(data$X),max(data$X), by=1)
axis(side=1, # 1-left, 2-top, 3-right, 4-bottom
at=coords.x # coordinates for tickmarks
)
#Create Y axis
#we want 10 tickmarks along the data range
coords.y <- seq(min(data$Y), max(data$Y), length.out=10)
#and our labels will be the rounded values of y
labels.y <- round(coords.y, digits=2)
axis(side=2,
at=coords.y,
labels=labels.y, # we want specific labels
las=2 # turn the text so it is parallel to OX
)
Plot gridlines so that it is easier to read the plot. There should be a grey dashed line from each tickmark on both axes.
abline(v=coords.x, col="darkgrey", lty=3)
abline(h=coords.y, col="darkgrey", lty=3)
#you could also use grid()
Define a new mycol function that takes a color name and a transparency value as two arguments and returns the corresponding rgb color value.
#Function for adding transparency to a given color.
mycol <- function(colname="olivedrab", transparency=.5) {
#convert color name to its RGB value and add the desired
#transparency
color <- c(as.vector(col2rgb(colname))/255, transparency)
# and make a new color from the above
color <- rgb(color[1], color[2], color[3], color[4])
return(color)
}
Plot datapoints so that their size is proportional to \(e^r\) where \(r\) is the radius, points at even X should be round and blue and points at odd X square and grey.
#Plot radii
points(data[data$X%%2 == 0,], pch=19, cex=exp(r), col=mycol("slateblue", .5))
points(data[data$X%%2 != 0,], pch=15, cex=exp(r), col=mycol("grey", .5))
Plot centers of the points as a cross: grey for blue/even points and red for grey/odd points.
points(data[data$X%%2 == 0,], pch=3, cex=1, col="darkgrey")
points(data[data$X%%2 != 0,], pch=3, cex=1, col="red")
Add grey text ‘Center’ at the center of the plot.
center.x <- mean(range(data[,1]))
center.y <- mean(range(data[,2]))
text(x=center.x, y=center.y, "Center", col="lightgrey")
Add title ‘Odds and Ends’ and text ‘X’ and ‘Y’ on the margins of the appropriate axes.
title("Odds and Ends")
mtext("Y", side=2, line=3, cex.lab=1,las=2, col="blue")
mtext("X", side=1, line=3, cex.lab=1,las=1, col="blue")
Add a legend for ‘odd’ and ‘even’ points. Place it in the top-right corner.
legend('topright',
legend=c("odd", "even"),
col=c(mycol("slateblue", .5), mycol("grey", .5)),
pch=c(19,15),
cex=1,
pt.cex=1.2,
title="Legend",
bty='n'
)
A female child was measured at the following dates:
‘30-09-2015’, ‘12-10-2015’, ‘19-10-2015’, ‘26-10-2015’, ‘07-11-2015’, ‘16-11-2015’, ‘30-11-2015’, ‘11-01-2016’, ‘08-02-2016’, ‘14-03-2016’, ‘05-04-2016’, ‘14-04-2016’, ‘31-05-2016’, ‘14-07-2016’,
the measured weights in grams were: 3300, 3540, 3895, 4070, 4230, 4385, 4855, 5865, not taken, 6736, 7065, 7080, 7530, 7640 and
the measured lengths: 43, no measurement taken, 53, 54, 55, 56, 58, 62.5, 65, 67, 67.5, 67.5, 70.5, 71.5.
The headcircumference for the same datapoints was (in cm): 34, 35.5, 36.1, 36.8, 36.8, 37.3, 38, 40.2, 41.4, 42.1, not taken, 43, 44, 45.
Your task is to plot these data on the WHO centile grids. Choose weight/length/circumference depending on the month you was born:
We save the timepoints and measurements in vectors.
library(lubridate)
timepoints <- dmy(c('30-09-2015', '12-10-2015','19-10-2015', '26-10-2015', '07-11-2015', '16-11-2015','30-11-2015', '11-01-2016', '08-02-2016', '14-03-2016', '05-04-2016', '14-04-2016', '31-05-2016', '14-07-2016'))
weight <- c(3300, 3540, 3895, 4070, 4230, 4385, 4855, 5865, NA, 6736, 7065, 7080, 7530, 7640)
length <- c(43,NA,53,54,55,56,58,62.5,65,67,67.5,67.5,70.5,71.5)
head <- c(34,35.5,36.1,36.8,36.8,37.3,38,40.2,41.4,42.1,NA,43,44,45)
We can calculate the interval between the dates of measurements by two approaches:
Approach1:
Simply calculate the interval by subtracting each date from the the first date of data collection (e.g. ‘2016-05-31’ - ‘2015-09-30’ ) and then convert it to months.
xpoints <- (as.Date(timepoints) - as.Date(timepoints[1]) )/ 30
Approach2:
Calculate the intervals by seconds and then use WHO standard months day length which is 30.4375 to calculate by month.
Use function dmy()
from the lubridate package to create a vector of timepoints.
HINTS:
- We can define an Interval using the %--%
operator.
- check as.duration()
and ddays()
functions.
who.month <- 30.4375 #days
xpoints <- as.duration(timepoints[1] %--% timepoints) / ddays(1) / who.month
Download manually and then load:
Go to WHO website (http://www.who.int/childgrowth/standards/en/) and find out the link to the dataset of your concern, e.g. Weight for age, percentiles for girls have the following address: http://www.who.int/entity/childgrowth/standards/wfa_girls_p_0_5.xlsx
Or using direct link to the excel file:
library(readxl)
uri <- "https://cdn.who.int/media/docs/default-source/child-growth/child-growth-standards/indicators/weight-for-age/tab_wfa_girls_p_0_5.xlsx?sfvrsn=666fe445_7"
local_file_path <- "wfa_girls_p_0_5.xlsx" ## give the local path of the downloaded file
download.file(url = uri, destfile = local_file_path, mode = "wb")
myData <-read_excel(local_file_path)
Create an empty plot to show your and WHO data,
plot(1, xlim=c(0, max(myData$Month)), type='n', bty='n', ylim=c(0, max(myData[,c(5:19)])), las=1, xlab='Month', ylab='kg')
grid()
Plot WHO mean and percentiles: P25, P75, P0.1 and P99.9, use different colors and line types to make the plot pretty.
lines(myData$M, col='grey', lty=1)
lines(myData$P25, col='blue', lty=2)
lines(myData$P75, col='blue', lty=2)
lines(myData$P01, col='tomato', lty=2)
lines(myData$P999, col='tomato', lty=2)
Plot your data on top of the percentiles, mind the units so that they match with the WHO ones
points(xpoints, weight/1000, pch=3, type='l', cex=.5)
points(xpoints, weight/1000, pch=3, type='p', cex=.5)
Add descriptions of the confidence lines in the margins
mtext(text = c('P0.1','P25','P75','P99.9'), side = 4, at=myData[dim(myData)[1], c('P01','P25','P75','P999')], las=1, cex=.8)
You task here is to use the already acquired R knowledge to plot an interesting relationship between two freely selected variables available at Hans Rosling’s Gapminder Foundation page.