Very quick introduction to R
In several exercises in this course, we use the statistical programming environment R. Here follows a very quick introduction, primarily for those who have never used R before. You can go through this now, or return to it when you first encounter R in the exercises.
First start R on your computer. How to do this depends on your
operating system. If you are on a Linux or Mac OS X system, you would
typically execute the command R
at the shell prompt.
In this course, we are running R on a Linux server. If you would like to install R on your own computer, you can find the appropriate download for your system here.
When R starts, you should see a text message stating the version of
R you are running and some further information, followed by a
command-line “prompt” ( >
, a greater-than sign). The prompt means that
R is waiting for you to type a command. To get a feel for how this
works, try out some arithmetic::
> 2 + 3
[1] 5
> 5 * 4 + 10
[1] 30
And some function calls
> abs(-5)
[1] 5
> sum(1,5,10)
[1] 16
To find out how to use a function, type its name preceded by a question mark::
> ?sin
This will bring up some help documentation for the function. You can use the arrow keys to scroll the help text up and down. Press q to get back to the R prompt.
Now try this:
> a <- 10
This command created an object called a. Objects are an important concept in R (as in many other programming languages), and we will be creating more of them in the RNA-seq exercises. We can inspect an object by just typing its name:
> a
[1] 10
We can also change the value of an object that we’ve created:
> a <- 2 * a
> a
[1] 20
The object a created above is a vector with a single element. To
create a vector with several elements, you can use the function c()
:
> b <- c(1, 2, 10)
> b
[1] 1 2 10
Or the colon operator::
> 1:10
[1] 1 2 3 4 5 6 7 8 9 10
A matrix can be created with the function cbind()
:
> b <- cbind(1:10, 101:110)
> b
[,1] [,2]
[1,] 1 101
[2,] 2 102
[3,] 3 103
[4,] 4 104
[5,] 5 105
[6,] 6 106
[7,] 7 107
[8,] 8 108
[9,] 9 109
[10,] 10 110
We can then use indices to access selected elements of the matrix:
> b[1,]
[1] 1 101
> b[, 2]
[1] 101 102 103 104 105 106 107 108 109 110
> b[c(5,8), 2]
[1] 105 108
To end your R session, use the functions quit()
or q()
:
> q()
Save workspace image? [y/n/c]: n
If you wish, you can save a workspace image. This means that the state of your R session is written to a file, which will be loaded the next time you start R in the same working directory. This is good if you want to continue the session at a later time. If not, just answer n (for no).
You can find manuals for R and more information on the R web site.
Working with R packages
R contains many functions, in particular for statistics and plotting. There are numerous additional functions and data sets, contained in R packages that are not installed by default.
Many such extension packages can be obtained from the Comprehensive R Archive Network (CRAN). We will also download packages from BioConductor, an alternative repository focused on R packages for bioinformatics.
Installing packages from CRAN
To install a CRAN package in R, use the install.packages() function. This simple command downloads the package from a specified repository (by default, CRAN) and installs it on your machine. For example, we can install a useful package called gplots:
> install.packages("gplots")
After installation, the package is stored in a location on your computer called the R package library. Packages are loaded from this library using the function library()
:
> library("gplots")
One useful function in the gplots package is heatmap.2()
, which plots heatmaps with dendrograms. Since we have installed and loaded the gplots package, we can now use this function. Let’s try it with a matrix of random, normally distributed values:
> heatmap.2(matrix(rnorm(180),ncol=6))
Note that you only need to install a package once. The next time you start R, you can run library("gplots")
directly, because the package has already been installed. If you upgrade to a new version of R, you may have to reinstall packages.
Installing packages from BioConductor
To install packages from BioConductor, you first have to load a function biocLite()
from BioConductor and then use that function to install the packages.
That is done in two lines of code:
> source("https://bioconductor.org/biocLite.R")
> biocLite("Name of package")
For example, let’s install the package DESeq2, which provides functions for RNA-seq differential expression analysis:
> source("https://bioconductor.org/biocLite.R")
> biocLite("DESeq2")
You should now be able to load the package DESeq2:
> library("DESeq2")
Installing packages from GitHub
Sometimes you want to install packages that are still under development. Those are mostly found on GitHub.
The example below shows how you can switch between a stable version and a development version of a package, in this case the package ggplot2. This example is provided for reference and you don’t have to understand it (or try it) for completing the RNA-seq labs.
First, we install and load the regular (stable) version of the ggplot2 package from CRAN:
> install.packages("ggplot2")
> library(ggplot2)
To see which packages are loaded, and their versions, use the function sessionInfo()
:
> sessionInfo()
Note the version number of ggplot2 reported by sessionInfo()
.
There is no entirely safe way to unload packages in R, so to work with a different package version, it is safest to quit and restart R. To quit:
> q()
Save workspace image? [y/n/c]: n
After starting R again, we install and load the devtools package, which provides functions for working with development versions of packages:
> install.packages("devtools")
> library(devtools)
Next we install the development version of ggplot2 from GitHub:
> dev_mode(on=TRUE)
> install_github("hadley/ggplot2")
Now we can load and work with the the development version of ggplot2:
> library(ggplot2)
> sessionInfo()
Check the output from sessionInfo()
to verify that you now get a different version number for ggplot2!
When finished working with development versions, do:
> dev_mode(on=FALSE)
Note that if you only want to install from GitHub, you can ignore the dev_mode function calls above.