This is the hands-on material for Reproducible Research. These are series of excercises to help you get started with reproducible research using R. You can consult the RMarkdown cheatsheet for quick reference.
Create a new project in RStudio by going to File > New Project > New Directory
. Select New Project
if required. Then label the project name and the directory. An empty project is created. The R session has been refreshed. All variables are removed and the environment is cleared.
Create a new RMarkdown file by going to File > New File > RMarkdown...
. Use the default options.
The environment within R is going to be managed by renv
,
while lower level, such as versions of R, can be managed by Conda management system.
You can skip Conda at the moment, if you want, install renv
package using install.packages("renv")
on R, and continue to renv
section below, instead.
Regarding Conda, please refer to the link for questions about the installation.
If your project includes programs written in other programming languages, e.g. Python, the overall environment can be managed by Conda.
NBIS has a dedicated workshop titled Tools for reproducible research which covers more on Conda and other programs for reproducible research.
For R, we can start with a simple environment file by creating a file named environment.yml
having following text.
It describes which versions of R, renv
and tidyverse
are to be installed.
The file should be created on the folder of your project, as fancy_project/environment.yml
.
channels:
- conda-forge
dependencies:
- conda-forge::r=4.0
- conda-forge::r-renv=0.12.1
- conda-forge::r-tidyverse=1.3.0
Using Terminal (Mac/Linux) or Command line (Windows), run following commands to create an environment and activate it.
conda env create -n fancy_env -f environment.yml
conda activate fancy_env
The environment within R, e.g. installed packages, is managed by renv
.
A good reference about the package is available at “introduction to renv
”.
The first step is the initialization of a local environment using the command in R below.
renv::init()
As an alternative, you could choose to initialize renv
when an R project is created as shown below.
After the command, you may find renv
folder and renv.lock
file created.
Check the file renv.lock
using a text viewer or less
command on Mac/Linux.
Whenever a package should be installed, please use renv::install
instead of install.packages
as below.
renv::install("dplyr") # from CRAN
renv::install("bioc::Biobase") # from Bioconductor
renv::install("StoreyLab/qvalue") # from GitHub
# from GitLab/Bitbucket
# renv::install("[gitlab|bitbucket]::`*user name*`/`*repository*`")
The current status of R environment of your project can be stored by calling renv::snapshot()
.
Please note that the function checks all R scripts under your R project folder.
It updates renv.lock
for the packages loaded in the scripts only.
Check renv.lock
file again after calling the renv::snapshot()
.
If no R script was created under the project folder yet, surprisingly no change in renv.lock
will be found.
Now, create a simple R script that loads one of the packages you installed,
for example, a file named test.R
that has just one line shown below.
library(dplyr)
Call renv::snapshot()
and check the renv.lock
file.
You will see it becomes so long, now.
The renv::snapshot()
command doesn’t have to be called every time a package is installed.
Just make sure it should be called when an environment is to be stored, e.g. before sharing the codes and the environment with colleagues.
The environment can be restored on a new place by the commands below.
renv::init()
renv::restore()
We can test it by creating a new R project.
Copy your scripts but also renv.lock
to the folder of the new R project, run the commands above on R and find what the R installs.
Please note that no contents under renv
folder is needed to restore the R environment.
The content on the top of the RMarkdown document in three dashes is the YAML matter. The YAML matter for this page looks something like below:
---
title: "Reproducible research"
subtitle: "RaukR 2021 • Advanced R for Bioinformatics"
author: "<b>Roy Francis and Mun-Gwan Hong</b>"
output:
bookdown::html_document2:
toc: true
toc_float: true
toc_depth: 3
number_sections: true
theme: united
highlight: textmate
df_print: paged
code_folding: none
self_contained: false
keep_md: false
encoding: "UTF-8"
css: ["assets/lab.css"]
---
The title, subtitle, author and date is displayed at the top of the rendered document. Argument output
is used to specify output document type and related arguments. rmarkdown::html_document
is commonly used to specify the standard HTML output. rmarkdown::pdf_document
is used to specify the standard PDF output. This then takes further arguments. Sub arguments differ depending on the output document type.
Above are some of the arguments that can be supplied to the HTML document type. theme
is used to specify the document style such as the font and layout. highlight
is used to specify the code highlighting style. toc
specifies that a table of contents must be included. toc_float
specifies that the TOC must float on the left of the page while scrolling. toc_depth
species the maximum level/depth to be displayed in the TOC. number_sections
specifies if the headings/sections must be automatically numbered. Use ?rmarkdown::html_document
for description of all the various options.
The above level 2 heading was created by specifying ## Text
. Other headings can be specified similarily.
## Level 2 heading
### Level 3 heading
#### Level 4 heading
##### Level 5 heading
###### Level 6 heading
Italic text like this This is italic text can be specified using *This is italic text*
or _This is italic text_
. Bold text like this This is bold text can be specified using **This is italic text**
or __This is italic text__
. Subscript written like this H~2~O
renders as H2O. Superscript written like this 2^10^
renders as 210.
Bullet points are usually specified using *
or +
or -
.
+ Point one
+ Point two
Block quotes can be specified using >
.
> This is a block quote. This
> paragraph has two lines.
This is a block quote. This paragraph has two lines.
Lists can also be created inside block quotes.
> 1. This is a list inside a block quote.
> 2. Second item.
- This is a list inside a block quote.
- Second item.
Links can be created using [this](https://rmarkdown.rstudio.com/)
like this.
Images can be displayed from a relative local location using ![This is a caption](rr_lab_assets/gotland.jpg)
. For example:
By default, the image is displayed at full scale or until it fills the display width. The image dimension can be adjusted ![This is a caption](rr_lab_assets/gotland.jpg){width=40%}
.
For finer control, raw HTML can be used. For example;
<img src="rr_lab_assets/gotland.jpg" width="150px">
Images can also be displayed using R code. Chunks option out.width
in RMarkdown can be used to control image display size.
This image is displayed at a size of 200 pixels.
```{r,out.width=200}
knitr::include_graphics('rr_lab_assets/gotland.jpg')
```
This image is displayed at a size of 75 pixels.
```{r,out.width=75}
knitr::include_graphics('rr_lab_assets/gotland.jpg')
```
Text can be formatted as code. Code is displayed using monospaced font. Code formatting that stands by itself as a paragraph is called block code. Block codes are specified using three backticks ```
followed by code and then three more backticks.
This text below
```
This is generic block code.
```
renders like this
This is generic block code.
Code formatting can also be included in the middle of a sentence. This is called inline code formatting. Using this `This is an inline formatted code.`
renders like this: This is an inline formatted code.
The above codes are not actually executed. They are just text formatted in a different font. Code can be executed by specifying the language along with the backticks. Block code formatted as such:
```{r}
str(iris)
```
renders like this:
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Code blocks are called chunks. The chunk is executed when this document is rendered. In the above example, the rendered output has two chunks; input and output chunks. The rendered code output is also given code highlighting based on the language. For example;
This code chunk
```{r,eval=FALSE}
ggplot(dfr4,aes(x=Month,y=fraction,colour=Year,group=Year))+
geom_point(size=2)+
geom_line()+
labs(x="Month",y="Fraction of support issues")+
scale_colour_manual(values=c("#000000","#E69F00","#56B4E9",
"#009E73","#F0E442","#006699","#D55E00","#CC79A7"))+
theme_bw(base_size=12,base_family="Gidole")+
theme(panel.border=element_blank(),
panel.grid.minor=element_blank(),
panel.grid.major.x=element_blank(),
axis.ticks=element_blank())
```
when rendered (echo=TRUE
by default, but not evaluated) looks like
ggplot(dfr4,aes(x=Month,y=fraction,colour=Year,group=Year))+
geom_point(size=2)+
geom_line()+
labs(x="Month",y="Fraction of support issues")+
scale_colour_manual(values=c("#000000","#E69F00","#56B4E9",
"#009E73","#F0E442","#006699","#D55E00","#CC79A7"))+
theme_bw(base_size=12,base_family="Gidole")+
theme(panel.border=element_blank(),
panel.grid.minor=element_blank(),
panel.grid.major.x=element_blank(),
axis.ticks=element_blank())
The chunk has several options which can be used to control chunk properties.
Using `{r,eval=FALSE}`
prevents that chunk from being executed. `{r,eval=TRUE}`
which is the default, executes the chunk. Using `{r,echo=FALSE}`
prevents the code from that chunk from being displayed. Using `{r,results="hide"}`
hides the output from that chunk. There are many other chunk arguments. Here are some of them:
Option | Default | Description |
---|---|---|
eval | TRUE | Evaluates the code in this chunk |
echo | TRUE | Display the code |
results | “markup” | “markup”,“asis”,“hold” or “hide” |
warning | TRUE | Display warnings from code execution |
error | FALSE | Display error from code execution |
message | TRUE | Display messages from this chunk |
tidy | FALSE | Reformat code in to be tidy |
cache | FALSE | Cache results for future renders |
comment | “##” | Character to prefix result output |
Chunk options are specified like this:
```{r,eval=FALSE,echo=FALSE,fig.height=6,fig.width=7}
```
R Plots can be plotted like below:
```{r,fig.height=6,fig.width=6}
plot(x=iris$Petal.Length,y=iris$Petal.Width)
```
Below are some of chunk options relating to plots.
Option | Default | Description |
---|---|---|
fig.height | 7 | Figure height in inches |
fig.width | 7 | Figure width in inches |
fig.cap | "" | Figure caption |
fig.align | “center” | Figure alignment |
dev | “png” | Change png, jpg, pdf, svg etc |
The RMarkdown notebook can be exported into various format. The most common formats are HTML and PDF.
The RMarkdown document can be previewed as an HTML inside RStudio by clicking the ‘Knit’ button.
The document can also be exported as an HTML file by running the code below:
rmarkdown::render("document.Rmd")
HTML documents can be opened and viewed in any standard browser such as Chrome, Safari, Firefox etc.
An Rmd document can be converted to a PDF. Behind the scenes, the markdown is converted to TeX format. The conversion to PDF needs a tool that understands TeX format and converts to PDF. This can be softwares like ‘MacTeX’, ‘MikTeX’ etc. which needs to be installed on the system beforehand.
The output argument in the YAML matter must be changed to pdf_document
, and the Rmd file can be converted as follows:
rmarkdown::render("document.Rmd")
The PDF output can also be specified as such:
rmarkdown::render("document.Rmd",output_format=pdf_document())
Sometimes TeX converters may need additional libraries which may need to be installed. And all features of HTML are not supported on TeX which may return errors.
See here for other export formats.
This is a do-it-yourself challenge to RMarkdown Notebook/Report. Have a look at the HTML page below and try to recreate the page. Instructions and helpful tips are given below. Use the R Markdown: The Definitive Guide to find the solutions.
This is the challenge report to prepare.
date: "`r format(Sys.Date(),format='%d/%m/%Y')`"
output:
rmarkdown::html_document:
toc: true
toc_float: true
?html_document
.
output:
rmarkdown::html_document:
number_sections: true
output:
rmarkdown::html_document:
theme: united
output:
rmarkdown::html_document:
highlight: kate
code_folding
option.
output:
rmarkdown::html_document:
code_folding: hide
df_print
option.
output:
html_document:
df_print: paged
html_document
to pdf_document
and remove all arguments except toc: true
. Try rendering using ‘Knit’.ioslides is an HTML and javascript based presentation system. RMarkdown provides a way to use this framework through R purely using R code. Here is an example of an ioslide presentation.
This is a do-it-yourself challenge to ioslides Presentation using RMarkdown. Have a look at the presentation below and try to recreate it. Instructions and helpful tips are given below.
This is the final result for challenge ioslide presentation.
output: ioslides_presentation
. See here for a guide to using ioslides presentation in R. See ?ioslides_presentation()
for the options.#
with level 1 heading or ----
without a heading. Subtitle can be specified using a pipe symbol, like this # Title | Subtitle
. Slides starting with #
contains only the title and/or subtitle. ##
adds a standard content slide with a heading on the top.# Title Slide | Subtitle
## Level 2 Heading
- Point one
- Point two
The bullets can set to show incrementally on click:
## Level 2 Heading
> - Point one
> - Point two
```{r}
data(iris)
str(cars)
```
```{r}
head(cars)
```
```{r,fig.height=3,fig.width=5,fig.cap="This is a scatterplot."}
plot(cars$speed,cars$dist)
```
![](rr_lab_assets/gotland.jpg){width=60%}
O
to see an overview of the slides.revealJS is an HTML and javascript based presentation system. The revealjs
R package provides a way to use this framework through R purely using R code. Here is an example of an revealJS presentation.
This is a do-it-yourself challenge to RevealJS 2-D Presentation using RMarkdown. Have a look at the presentation below and try to recreate it. Instructions and helpful tips are given below.
This is the challenge RevealJS presentation.
revealjs
and load the library.output: revealjs_presentation
. See here for a guide to using RevealJS presentation in R. See ?revealjs_presentation
for options.#
with level 1 heading,##
with level 2 heading or ----
without a heading. The best feature of RevealJS is that slides are not restricted to horizontal (linear) flow. Slides can also flow vertical from any of the horizontal slides. A level 2 heading ##
signifies that the content under that flows vertically.## Level 2 Heading
```{r}
data(iris)
str(cars)
```
## Level 2 Heading
```{r}
head(cars)
```
## Level 2 Heading
```{r,fig.height=3,fig.width=5,fig.cap="This is a scatterplot."}
plot(cars$speed,cars$dist)
```
![](rr_lab_assets/gotland.jpg)
O
to see an overview of the slides.xaringan is an R package that provides bindings to remarkjs HTML and javascript based presentation system. Here is an example of a remarkjs presentation. Here is an example that actually uses the xaringan package in R.
The remarkjs repo and wiki pages and the xaringan repo and wiki are good sources of documentation.
This is a do-it-yourself challenge to remarkjs presentation system using RMarkdown in xaringan. Have a look at the presentation below and try to recreate it. Instructions and helpful tips are given below.
This is the challenge Xaringan presentation.
xaringan
and load the library.output: xaringan::moon_reader
. Use ?moon_reader()
for options.---
. Headings as usual are #
for level 1 heading,##
with level 2 heading etc.# Bullet points
- Point One
- Point Two
Note that Xaringan/remarkjs does not use the regular pandoc markdown, therefore some features may not work or may be different. For example superscript is created using 2<sup>10</sup>
rather than 2^10^
. Subscript is created using 2<sub>10</sub>
rather than 2~10~
.
Add some code content. For example;
```{r}
data(iris)
str(cars)
```
name: name
. Add a heading and add an R plot on this slide.---
name: plots
# Plots
```{r,fig.height=3,fig.width=5,fig.cap="This is a scatterplot."}
plot(cars$speed,cars$dist)
```
Note that the slide name is added immediately under the new slide creator ---
.
---
name: table
# Table
```{r}
head(cars)
```
From this slide, create a link to go back to plots.
Click [here](#plots) to go to plots.
---
name: image
# Image
![](rr_lab_assets/gotland.jpg)
If this is too big, you can manually set the width using raw HTML <img src="rr_lab_assets/gotland.jpg" width="250px">
```{r}
ggplot(iris,aes(x=Petal.Length,Sepal.Width,col=Species))+
geom_point(size=2)+
labs(x="Petal Length",y="Sepal Width")+
scale_colour_manual(values=c("#000000","#E69F00","#56B4E9",
"#009E73","#F0E442","#006699","#D55E00","#CC79A7"))+
theme_bw(base_size=12)+
theme(panel.border=element_blank(),
panel.grid.minor=element_blank(),
panel.grid.major.x=element_blank(),
axis.ticks=element_blank())
```
say, we want to highlight theme_bw(base_size=12)
. We can add {{}}
around it. Add the below code to a new slide called highlighting.
```{r}
ggplot(iris,aes(x=Petal.Length,Sepal.Width,col=Species))+
geom_point(size=2)+
labs(x="Petal Length",y="Sepal Width")+
scale_colour_manual(values=c("#000000","#E69F00","#56B4E9",
"#009E73","#F0E442","#006699","#D55E00","#CC79A7"))+
{{theme_bw(base_size=12)}}+
theme(panel.border=element_blank(),
panel.grid.minor=element_blank(),
panel.grid.major.x=element_blank(),
axis.ticks=element_blank())
```
$e^{i\pi} + 1 = 0$
$$\frac{E \times X^2 \prod I}{2+7} = 432$$
$$\sum_{i=1}^n X_i$$
$$\int_0^{2\pi} \sin x~dx$$
Add a --
between each line to display them incrementally.
$e^{i\pi} + 1 = 0$
--
$$\frac{E \times X^2 \prod I}{2+7} = 432$$
--
$$\sum_{i=1}^n X_i$$
--
$$\int_0^{2\pi} \sin x~dx$$
When viewing the presentation, press H
to see an overview of keyboard shortcuts during presentation.
remarkjs has a nifty feature where you can press C
to clone the presentation in a new browser window. Now both these copies are linked. You can change one of them to the presenter mode by pressing P
. Now changing the slide in one window changes the other. Convenient when a presenter view/audience view is needed.
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-conda_cos6-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS/LAPACK: /home/roy/miniconda3/envs/r-4.0/lib/libopenblasp-r0.3.10.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] stringr_1.4.0 dplyr_1.0.6 ggplot2_3.3.3 fontawesome_0.2.1
## [5] captioner_2.2.3 bookdown_0.22 knitr_1.33
##
## loaded via a namespace (and not attached):
## [1] highr_0.9 bslib_0.2.5.1 compiler_4.0.2 pillar_1.6.1
## [5] jquerylib_0.1.4 tools_4.0.2 digest_0.6.27 jsonlite_1.7.2
## [9] evaluate_0.14 lifecycle_1.0.0 tibble_3.1.2 gtable_0.3.0
## [13] pkgconfig_2.0.3 rlang_0.4.11 DBI_1.1.1 yaml_2.2.1
## [17] xfun_0.23 withr_2.4.2 generics_0.1.0 sass_0.4.0
## [21] vctrs_0.3.8 tidyselect_1.1.1 grid_4.0.2 glue_1.4.2
## [25] R6_2.5.0 fansi_0.5.0 rmarkdown_2.8 purrr_0.3.4
## [29] magrittr_2.0.1 scales_1.1.1 htmltools_0.5.1.1 ellipsis_0.3.2
## [33] assertthat_0.2.1 colorspace_2.0-1 utf8_1.2.1 stringi_1.6.2
## [37] munsell_0.5.0 crayon_1.4.1
Built on: 13-Jun-2021 at 23:19:58.
2021 • SciLifeLab • NBIS • RaukR