R packages
Exercise
This is the R packages exercise from RaukR 2018. It will take you through creating an R package with code, data, documentation, creating or generating a correct DESCRIPTION and NAMESPACE, importing packages, checking your package for warnings and using C++ functions through Rcpp
.
The first thing we want to do is install the dependencies required for the exercise.
If you haven’t already, install Rstudio.
First of all we will create the standard files that are required in an R package. You can do this from within Rstudio or using devtools
.
Select File > New Project > New Directory > R package, give your package a name and click Create Project. This should trigger your Rstudio to reload, showing your package name in the top right corner, your packages file structure in the file viewer and a standard hello.R R code file in the editor.
devtools::create("path/to/your/package/packagename")
I recommend using the Rstudio interface when working on your R package. To open this package in Rstudio, double click the .Rproj file or select it from Rstudio.
Now that we have a R package backbone lets fill it with stuff!
Start by editing your hello.R file, you can rename it if you wish. Feel free to rewrite the function to perform some simple task. From adding two numbers to creating a basic plot from some input. It’s up to you!
Here is an example that I threw together:
When you are happy with your function, save it and load it using devtools::load_all()
or the keyboard shortcut Ctrl/Cmd + Shift + L
Now test your function in the console!
## a
## b b
## c c c
## d d d d
## e e e e e
## f f f f f f
## g g g g g g g
## h h h h h h h h
## i i i i i i i i i
## j j j j j j j j j j
## k k k k k k k k k k k
## l l l l l l l l l l l l
Congratulations! You have just created an R package that fulfils a function!
Lets add some data to our package, create an R object with whatever information you want. A string, a vector or a data.frame, as long as you recognise it, it doesn’t matter. First we will save it to our data/ folder using devtools::use_data()
.
# A random object
your_object <- c('red','green','blue')
#Save to data
devtools::use_data(your_object)
This created the data/ folder and your_object.rda inside it.
Notice that we are not giving it our packages location. This is because it will default to using the package in the working directory, if your current R session is not in the package root it will complain.
# Change working directory
setwd("..")
# Try to save data
devtools::use_data(your_object)
# Error: Could not find package root.
Now when our package is loaded, your_object
will be made available to us. To test this, remove all objects from your working directory, and load all functions. Check that you can access the object, despite just removing it from your workspace.
# Clear workspace
rm(list=ls())
# Load all functions and data from our package
devtools::load_all()
# Check if we can access our saved object
your_object
[1] "red" "green" "blue"
The data in data/ folder is available to the user, needs to be documented, and is where you would store data if the purpose of your package is to distribute one or more datasets in an R friendly way. If there is some data that you don’t want to make easily accessible to the users, AKA not document, but that your functions use, you can put it in sysdata.rda. Lets create a second object, like the last one it does not matter what it is. Use the same command as before to save the object, except this time specify that this data is intended for internal use.
# Create a second object
second_object <- "It works!"
# Save it to R/sysdata.rda
devtools::use_data(second_object, internal = TRUE)
To check that this worked, clear your workspace and edit your packages function, the .R file, to include the object in some way. Load the package and see if it executes as expected.
# Example .R code edit
trianguletter <- function(x) {
for(i in 1:x){
cat(rep( letters[i], times = i),"\n")
}
# Check if second object can be called by our function
cat(second_object)
}
## a
## b b
## c c c
## d d d d
## e e e e e
## It works!
Great work! You have created external and internal datasets and shown that they can be used when your package is loaded in the console and in your packages own functions!
Now that we have a function and a dataset, lets use roxygen2
to create some documentation for them.
As was discussed in the presentation, the roxygen2
documentation for a function is directly before it in so called “comment blocks”, or #'
. Here is a brief refresher:
#'
@tagname
@param
: parameter@example
: examples@return
: what does the function return. The value field in R documentation@section
: create any section you wantGo ahead and create documentation for your function. add at least a title, a description, multi-section details, the parameters, the expected return value or output and an example of how to use your function.
An example functional documentation:
#' A right sided triangle of alphabetic letters
#'
#' A right sided triangle of alphabetic letters
#'
#' This function takes a number as input and outputs an increasing
#' number of alphabetic letters on top of eachother, resembling
#' a right sided triangle.
#' @section Warning:
#' Not tested for numbers over 26!
#'
#' @param x A number.
#' @return Outputs to console. NULL object returned.
#' @examples
#' trianguletter(10)
trianguletter <- function(x) {
for(i in 1:x){
cat(rep( letters[i], times = i),"\n")
}
# Check if second object can be called by our function
cat(second_object)
}
When you are satisfied with your documentation, build it using devtools::document()
, or Ctrl/Cmd + Shift + D in Rstudio. This creates the .Rd file in man/ which is parsed by R when you request the functions documentation. Additionally, the first time you build documentation it will make some edits to your DESCRIPTION and check on your NAMESPACE.
Go ahead and preview your function as you would any method and make sure that it looks the way you were expecting.
Data documentation is a bit different from functional documentation. Remember that you do not need to document datasets not intended for users, so don’t worry about sysdata.rda. What we want to document in this case is the data you created in man/your_object.rda, or whichever name you have given it.
The principle is very similar to functional documentation, but not all tags that are applicable to functions are applicable to data, and should not be used. Similarly, there are some tags that are applicable to data, but not to functions. Usually this is the information you would give a dataset:
@format
: what rows and variables are in the data?@source
: where is the data from?First of all, lets check what the output of requesting help for your dataset currently is.
## No documentation for 'your_object' in specified packages and libraries:
## you could try '??your_object'
As expected, it is undocumented. Since we cannot add this information to the dataset file, .rda, lets create an R file in R/ called data.R and add the documentation there. The name we document has to be the same name as the dataset object you created earlier. Go ahead and document your dataset now.
#' A vector with three strings
#'
#' A dataset containing three strings usually linked to the
#' colors of pixels on a screen.
#'
#' @format A vector with three strings:
#' \describe{
#' \item{red}{A string, it's red.}
#' \item{green}{A string, it's green. Street talk for money.}
#' \item{blue}{A string, really didn't see that coming. It's blue this time!}
#' }
#' @source \url{http://www.themindofsebastian.com}
"your_object"
Once you are happy with your documentation, save the file and run devtools::document()
. Now check the help page for your dataset again.
Well done! You have successfully created documentation for a function and a dataset and shown that it can be queried within R.
While we are on the topic of documentation, lets check out how to get started writing your vignette. To initialise your vignette run devtools::use_vignette("packagename_vignette")
.
What this does:
knitr
to Suggests
and as a VignetteBuilder
.Open the file vignettes/packagename-vignette.Rmd. Edit the header data, Vignette Title
and Vignett Author
, and create a minimal vignette for your function. Do run your function and show your dataset using knitr
from the vignette. It can be as short as you want, however if you do this for a real package it should be a long form manual showing how your package can use its functions or data to perform the task it was designed for.
Note: To be able to use the functions of your package in the vignette you will need to install your package, the easiest way is to execute Cmd/Ctrl + Shift + B.
To preview your vignette while working on it, press the knit button in Rstudio.
Example vignette.Rmd:
---
title: "How to write more and more letters, in a triangle"
author: "Sebastian DiLorenzo"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{How to write more and more letters, in a triangle}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
\```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
# replace this with your package name
library(newpkg)
\```
# The greatest package for a very specific purpose
If you thought that it was impossible to write letters
alphabetically from top to bottom with each number represented
the same number of times asits position in the alphabet
- think again!
This package and its sole function, trianguletter, solves just this problem!
But you do not have to take my word for it, see for yourself:
\```{r,echo=T}
# With just a simple number we specify how long into the alphabet we go.
trianguletter(10)
# Also, this package has this random dataset:
your_object
\```
# Future plans
We plan do add features so that you can give it
a letter, rather than a number, and output a
bunch of numbers instead!
Note: Because I am writing this in rmarkdown I added backslashes before the code or it would not render properly for this exercise instruction. In your rmarkdown there should not be backslashes before code chunks. See the \``` in the vignette code.
And the beautiful rendered version:
The preview you get when knitting the vignette.Rmd does not mean that the vignette has been created. To do this you can use devtools::build_vignettes()
.
Reinstall the package again by executing Cmd/Ctrl + Shift + B and view the vignette with browseVignettes("packagename")
.
Once again, good work! Now you have written a short guide to your package that will be included wherever it goes!
Now lets take a look at our DESCRIPTION file, at this point it should look similar to this:
Package: newpkg
Type: Package
Title: What the Package Does (Title Case)
Version: 0.1.0
Author: Who wrote it
Maintainer: The package maintainer <yourself@somewhere.net>
Description: More about what it does (maybe more than one line)
Use four spaces when indenting paragraphs within the Description.
License: What license is it under?
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
Suggests: knitr,
rmarkdown
VignetteBuilder: knitr
Note: The information can look slightly different depending on how the package was created, but the information is equivalent. For example you might see
Authors@R
instead ofAuthor
andMaintainer
.
Looks pretty good but some information definitely needs to be updated if you are ever going to submit this to a repository.
Update the Title, Author, Maintainer, Description and License fields.
Just in case you don’t want to consider which license to give your package, why not choose one of the most common ones that were mentioned during the lecture?
A good place to look at the meaning of licenses is https://tldrlegal.com.
Now that your DESCRIPTION is up to shape, we can move on to the NAMESPACE.
In the NAMESPACE we want to make our function available to the users and make sure that our package works well with other packages.
First, lets import a function from a package and add its functionality to the function we created.
melt
function from reshape2
. This is in the form @importFrom pkg function
.To update the NAMESPACE run devtools::document()
.
Note: If you didn’t use
devtools
to create your package you may get an error to the effect that roxygen was not used to create this NAMESPACE and it won’t be overwritten. In this case delete your NAMESPACE and rundevtools::document()
again.
Now your NAMESPACE should look like this:
# Generated by roxygen2: do not edit by hand
importFrom(reshape2,melt)
Great! If the melt
command from reshape2
is ever used in your package, it will know which one to use. But for our package to use reshape2
we first have to import it in DESCRIPTION.
Add the Imports
field somewhere in your DESCRIPTION.
Imports:
reshape2
While Imports
means that if someone installs your package it will automatically install reshape2
as well, regrettably it does not mean that if reshape2
is missing when we reinstall it locally it will be downloaded and installed. Luckily we installed it at the start of the exercise.
When we imported reshape2
and roxygen2
recreated the NAMESPACE file, we sent a message to the package saying “We know what we are doing, you can take away the NAMESPACE kiddie-wheels”. These kiddie-wheels are specifically the export pattern that was there previously, basically exporting everything between heaven and earth.
exportPattern("^[^\\.]")
Now our package isn’t exporting anything, and if you reinstall it with Ctrl/Cmd + Shift + B you will notice that it cannot find your function anymore. This is fine, and we are about to fix this, by exporting the function!
In your .R file, directly above the function, add the @export tag, then rebuild the documentation with devtools::document()
.
Have a look at your NAMESPACE again. You should find that your function is now being exported.
Finally, rebuild and install your package with Ctrl/Cmd + Shift + B.
You do not need to export your data, wether internal or not.
Now that we have added melt
from reshape2
lets add it to our function. You can do this any way you like, or copy usage from the example function.
In this example the head
of iris
dataset before and after melt
is viewed, to see that it had an effect.
#' A right sided triangle of alphabetic letters
#'
#' A right sided triangle of alphabetic letters
#'
#' This function takes a number as input and outputs an increasing
#' number of alphabetic letters on top of eachother, resembling
#' a right sided triangle.
#' @section Warning:
#' Not tested for numbers over 26!
#'
#' @param x A number.
#' @return Outputs to console. NULL object returned.
#' @examples
#' trianguletter(10)
#' @export
trianguletter <- function(x) {
for(i in 1:x){
cat(rep( letters[i], times = i),"\n")
}
# Check if second object can be called by our function
#cat(second_object)
#what iris dataset looks like
head(iris)
#Use melt and see what it looks like
head(melt(iris))
}
Rebuild the package, Ctrl/Cmd + Shift + B. Since this loads the package you are working on, reshape2
should also be loaded. You can check that this is true in sessionInfo()
.
Test your function, does the output show that it can use melt
?
Now, lets check that the NAMESPACE is doing what we want it to do, making sure that the correct melt
is being used by our package. Lets define a new function, also named melt
, which will be in our global environment.
Now try your function again. Is it using the correct melt
?
It should be!
Note: An alternative way of doing this is using Imports in DESCRIPTION and writing your code with strict package references. In our case
reshape2::melt()
. This way you can circumvent usingroxygen2
to add import tags toNAMESPACE
if you want.
Now that we have a pretty complete package, lets run some checks on it. You can use the standard R CMD check
pkgname from your terminal, or you can use devtools::check()
, which we recommend, as it performs some additional operations such as updating the documentation and bundling the package before checking.
R CMD check results
0 errors | 0 warnings | 1 note
checking R code for possible problems ... NOTE
trianguletter: no visible global function definition for ‘head’
trianguletter: no visible binding for global variable ‘iris’
Undefined global functions or variables:
head iris
Consider adding
importFrom("datasets", "iris")
importFrom("utils", "head")
to your NAMESPACE file.
Did you get any NOTEs, ERRORs or WARNINGs? I know I did! Among other things it didn’t like my usage of the iris
dataset without specifying it in the NAMESPACE. Go ahead and try to fix whichever messages you received. Google if you need to.
On the odd chance that you didn’t get any check messages, try introducing an error, like creating an undocumented dataset or using a function that doesn’t exist. It’s easier to break things than to build them :).
R is not the always the most efficient language, which is why it is great that we can integrate other code with our package. Either using ready made solutions, or by including a script file of another language in the package.
Here we will integrate the well developed Rcpp
package to be able to use C++ code in our package.
Similarly to how we created the package, the first thing we want to do is setup our package to accept Rcpp
. We can do this with devtools::use_rcpp()
which does four things:
Rcpp
to Imports
and LinkingTo
.devtools::use_rcpp()
Lets include the roxygen tags in our utility.R file.
#' @useDynLib newpkg
#' @importFrom Rcpp sourceCpp
NULL
Now we are ready to create a C++ file. You can do this from Rstudio to generate a nice template, File > New file > C++ File. It should look something like this:
#include <Rcpp.h>
using namespace Rcpp;
// This is a simple example of exporting a C++ function to R. You can
// source this function into an R session using the Rcpp::sourceCpp
// function (or via the Source button on the editor toolbar). Learn
// more about Rcpp at:
//
// http://www.rcpp.org/
// http://adv-r.had.co.nz/Rcpp.html
// http://gallery.rcpp.org/
//
// [[Rcpp::export]]
NumericVector timesTwo(NumericVector x) {
return x * 2;
}
// You can include R code blocks in C++ files processed with sourceCpp
// (useful for testing and development). The R code will be automatically
// run after the compilation.
//
/*** R
timesTwo(42)
*/
You should not mess with the header, unless you know what you are doing. Here you can write any C++ function you want, either do this or leave it as it is and we will use the included example function, timeTwo
. The function is exported to R using // [[Rcpp::export]]
. Importantly, this does not add the function to your NAMESPACE. Add documentation to your function in the same way as we have done previously, but with the C++ commenting style of //
.
//' Multiply a number by two
//'
//' @param x A integer.
//' @export
// [[Rcpp::export]]
NumericVector timesTwo(NumericVector x) {
return x * 2;
}
Save the file to your src/ directory and call devtools::document()
to update your NAMESPACE. Now lets install and restart using Cmd/Ctrl + Shift + B. This will create the file RcppExports.R* in R/**, which is what R uses to call your C++ function.
Test that your function works from console.
Your R package can now run C++ code, marvelous! Now you just need to learn C++ ;).
Added a new component to the package have we? - Yoda
Perform devtools::check()
again and fix any new messages.
Well done! You have built a functional package. Maybe it is even time to update the version number in DESCRIPTION and take it out of development?
If you want to know even more about this topic, I recommend Hadley Wickhams excellent online resource http://r-pkgs.had.co.nz/check.html
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin16.7.0 (64-bit)
## Running under: macOS High Sierra 10.13.5
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rmarkdown_1.9 roxygen2_6.0.1 devtools_1.13.5 forcats_0.3.0
## [5] stringr_1.3.1 dplyr_0.7.5 purrr_0.2.5 readr_1.1.1
## [9] tidyr_0.8.1 tibble_1.4.2 ggplot2_2.2.1 tidyverse_1.2.1
## [13] captioner_2.2.3 bookdown_0.7 knitr_1.20
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.4 xfun_0.1 reshape2_1.4.3 haven_1.1.1
## [5] lattice_0.20-35 colorspace_1.3-2 htmltools_0.3.6 yaml_2.1.19
## [9] rlang_0.2.1 pillar_1.2.3 withr_2.1.2 foreign_0.8-70
## [13] glue_1.2.0 modelr_0.1.2 readxl_1.1.0 bindrcpp_0.2.2
## [17] bindr_0.1.1 plyr_1.8.4 commonmark_1.5 munsell_0.4.3
## [21] gtable_0.2.0 cellranger_1.1.0 rvest_0.3.2 memoise_1.1.0
## [25] psych_1.8.4 evaluate_0.10.1 parallel_3.5.0 broom_0.4.4
## [29] Rcpp_0.12.17 backports_1.1.2 scales_0.5.0 jsonlite_1.5
## [33] mnormt_1.5-5 hms_0.4.2 digest_0.6.15 stringi_1.2.2
## [37] grid_3.5.0 rprojroot_1.3-2 cli_1.0.0 tools_3.5.0
## [41] magrittr_1.5 lazyeval_0.2.1 crayon_1.3.4 pkgconfig_2.0.1
## [45] xml2_1.2.0 lubridate_1.7.4 rstudioapi_0.7 assertthat_0.2.0
## [49] httr_1.3.1 R6_2.2.2 nlme_3.1-137 compiler_3.5.0
Page built on: 12-Jun-2018 at 18:03:08.
2018 | SciLifeLab > NBIS > RaukR