Introduction
In this lab, we will go step-by-step through points that are necessary to create some nice-looking plots.
Generating data
First, we will produce some random data that we will later plot. Make a data frame with
- 20 random coordinates (x,y): y coming from N(0,1) – normal distribution with mean=0 and sd=1
- radius r for each data point, r coming from N(0,1).
- The x coord takes random values from 1 to 25 and
- Each point (row of the data frame) has a name ind1 … ind25,
First, look at the defaults:
- plot the data in the simplest possible way.
Click to see how
#20 random datapoints x <- sample(c(1:25), size=20, replace=T) y <- rnorm(n=20, mean=0, sd=1) # sample from normal r <- rnorm(n=20, mean=0, sd=1) # radius from normal names <- paste("ind", 1:20, sep="") # assign some names data <- data.frame(cbind(X=x,Y=y, R=r), row.names=names) plot(data[,1:2], cex=data$R)
Generating plot step-by-step
As you see, the points are displayed in a simple way, axes are set automatically, the radius is not reflected on the plot in any way (3rd dimension).
- build the plot from scratch, begin by displaying no points.
Click to see how
plot(data[,1:2], type='n')
- we still got a box around the plot and axes, we do not want these
either, remove these elements from the previous plot.
Click to see how
plot(data[,1:2], type='n',xaxt='n', yaxt='n', xlab="", ylab="", frame.plot=F)
- create X and Y axis so that they cover the whole range of x and
y. Make axis text slightly smaller, 70% of the default. For the Y
axis, set 10 equidistant tickmarks and set labels to their values
rounded to two decimals. Turn the labels, so that they are parallel
to the OX axis.
Click to see how
#Create X axis coords.x <- seq(min(data$X),max(data$X), by=1) axis(side=1, # 1-bottom, 2-left, 3-top, 4-right at=coords.x, # coordinates for tickmarks cex.axis=.7 # make labels smaller ) #Create Y axis #we want 10 tickmarks along the data range coords.y <- seq(min(data$Y), max(data$Y), length.out=10) #and our labels will be the rounded values of y labels.y <- round(coords.y, digits=2) axis(side=2, at=coords.y, labels=labels.y, # we want specific labels las=2 # turn the text so it is parallel to OX )
- plot auxiliary lines (a grid) so that it is easier to read the
plot. There should be a grey dashed line from each tickmark on both
axes.
Click to see how
abline(v=coords.x, col="darkgrey", lty=3) abline(h=coords.y, col="darkgrey", lty=3) #you could also use grid()
- define a new mycol function that takes a color name and a
transparency value as two arguments and returns the corresponding
rgb color value. OPTIONAL – if it seems to difficult, look up the
answer.
Click to see how
#Function for adding transparency to a given color. mycol <- function(colname="olivedrab", transparency=.5) { #convert color name to its RGB value and add the desired #transparency color <- c(as.vector(col2rgb(colname))/255, transparency) # and make a new color from the above color <- rgb(color[1], color[2], color[3], color[4]) return(color) }
- plot datapoints so that their size is proportional to e^r (e to the power of r , e is the base of the natural logarithm and e=2.71…) where
$r$ is the radius, points at even X should be round and blue and
points at odd X square and grey.
Click to see how
#Plot radii points(data[data$X%%2 == 0,], pch=19, cex=exp(r), col=mycol("slateblue", .5)) points(data[data$X%%2 != 0,], pch=15, cex=exp(r), col=mycol("grey", .5))
- plot centers of the points as a cross: grey for blue/even points and
red for grey/odd points.
Click to see how
points(data[data$X%%2 == 0,], pch=3, cex=1, col="darkgrey") points(data[data$X%%2 != 0,], pch=3, cex=1, col="red")
- add grey text ‘Center’ at the center of the plot.
Click to see how
center.x <- mean(range(data[,1])) center.y <- mean(range(data[,2])) text(x=center.x, y=center.y, "Center", col="lightgrey")
- add title ‘Odds and Ends’ and text ‘X’ and ‘Y’ on the margins of the
appropriate axes.
Click to see how
title("Odds and Ends") mtext("Y", side=2, line=3, cex.lab=1,las=2, col="blue") mtext("X", side=1, line=3, cex.lab=1,las=1, col="blue")
- add a legend for ‘odd’ and ‘even’ points. Place it in the top-right
corner.
Click to see how
legend('topright', legend=c("odd", "even"), col=c(mycol("slateblue", .5), mycol("grey", .5)), pch=c(19,15), cex=1, pt.cex=1.2, title="Legend", bty='n' )
Visualizing baby growth data on a WHO centile grid
A female child was measured at the following dates:
-
‘30-09-2015’, ‘12-10-2015’, ‘19-10-2015’, ‘26-10-2015’, ‘07-11-2015’, ‘16-11-2015’, ‘30-11-2015’, ‘11-01-2016’, ‘08-02-2016’, ‘14-03-2016’, ‘05-04-2016’, ‘14-04-2016’, ‘31-05-2016’, ‘14-07-2016’,
- the measured weights in grams were: 3300, 3540, 3895, 4070, 4230, 4385, 4855, 5865, not taken, 6736, 7065, 7080, 7530, 7640 and
- the measured lengths: 43, no measurement taken, 53, 54, 55, 56, 58, 62.5, 65, 67, 67.5, 67.5, 70.5, 71.5.
- The headcircumference for the same datapoints was (in cm): 34, 35.5, 36.1, 36.8, 36.8, 37.3, 38, 40.2, 41.4, 42.1, not taken, 43, 44, 45.
Your task is to plot these data on the WHO centile grids. Choose weight/length/circumference depending on the month you was born:
- weight: Jan, Apr, Jul, Oct
- length: Feb, May, Aug, Nov
- circumference: Mar, Jun, Sep, Dec
Good luck!
- use function dmy from the lubridate package to create a vector of timepoints.
Click to see how
library(lubridate) timepoints <- dmy(c('30-09-2015', '12-10-2015', '19-10-2015', '26-10-2015', '07-11-2015', '16-11-2015', '30-11-2015', '11-01-2016', '08-02-2016', '14-03-2016', '05-04-2016', '14-04-2016', '31-05-2016', '14-07-2016'))
- enter the measurement of choice as a vector
Click to see how
weight <- c(3300, 3540, 3895, 4070, 4230, 4385, 4855, 5865, NA, 6736, 7065, 7080, 7530, 7640) length <- c(43,NA,53,54,55,56,58,62.5,65,67,67.5,67.5,70.5,71.5) head <- c(34,35.5,36.1,36.8,36.8,37.3,38,40.2,41.4,42.1,NA,43,44,45)
- WHO months is 30.4375 days long. Transform timepoints into OX
coordinates so that the distance between them corresponds to the
days between the two measurements. HINT: check as.duration and
ddays functions. Do not feel bad if you have to click on the
key. Working with dates is not an easy piece. The point is to know
the lubridate exists…
Click to see how
who.month <- 30.4375 #days xpoints <- as.duration(timepoints[1] %--% timepoints) / ddays(1) / who.month
-
go to WHO website (http://www.who.int/childgrowth/standards/en/) and find out the link to the dataset of your concern, e.g. Weight for age, percentiles for girls have the following address: http://www.who.int/entity/childgrowth/standards/tab_wfa_girls_p_0_5.txt
- load the data using URL from the previous point and the
read.table function.
Click to see how
uri <- "http://www.who.int/entity/childgrowth/standards/tab_wfa_girls_p_0_5.txt" #uri <- "http://www.who.int/entity/childgrowth/standards/second_set/tab_hcfa_girls_p_0_5.txt" #uri <- "http://www.who.int/entity/childgrowth/standards/tab_lhfa_girls_p_0_2.txt" myData <-read.table(uri, header=T, sep='\t')
- create an empty plot to show your and WHO data,
Click to see how
plot(1, xlim=c(0, max(myData$Month)), type='n', bty='n', ylim=c(0, max(myData[,c(5:19)])), las=1, xlab='Month', ylab='kg', cex.axis=.7) grid()
- plot WHO mean and percentiles: P25, P75, P0.1 and P99.9, use
different colors and line types to make the plot pretty.
Click to see how
lines(myData$M, col='grey', lty=1) lines(myData$P25, col='blue', lty=2) lines(myData$P75, col='blue', lty=2) lines(myData$P01, col='tomato', lty=2) lines(myData$P999, col='tomato', lty=2)
- plot your data on top of the percentiles, mind the units so that
they match with the WHO ones,
Click to see how
points(xpoints, weight/1000, pch=3, type='l', cex=.5) points(xpoints, weight/1000, pch=3, type='p', cex=.5)
- add some descriptions on the margins
Click to see how
mtext(text = c('P0.1','P25','P75','P99.9'), side = 4, at=myData[dim(myData)[1], c('P01','P25','P75','P999')], las=1, cex=.5)
Visualizing Gapminder data
You task here is to use the already acquired R knowledge to plot an interesting relationship between two freely selected variables available at Hans Rosling’s Gapminder Foundation page.
- go to http://www.gapminder.org/data/
- select a dataset of interest,
- load data to R, take care of missing values etc.,
- find a nice way of visualizing the relationship between some selected variables,
- think of scales (linear, logarythmic), axes labels etc.,
- be creative,
- visualize a selected variables using boxplot and histogram on one plot (HINT: parameter mfrow),
- discuss the result with your colleagues and TAs.