• Write programs that do one thing and do it well.

  • Write programs to work together.

  • Write programs to handle text streams, because that is a universal interface.

–The UNIX philosophy, Doug McIlroy

In this exercise we will execute R scripts from the command line and provide them with options and data.


1 Executing R Scripts

While many R users write and execute code interactively (in e.g. RStudio), you probably know that you can run the content of a script simply by writing source(myscriptfile.R) in the R console. This is also a convenient way to load your custom functions (and this is actually what happens when you load an installed package with library()).

But once you have code that works you may want to run it routinely without an interactive R environment. R scripts can be executed directly from the command line by typing Rscript myscriptfile.R.

  • Make an R script that prints a summary of a sample (n=1000) from a normal distribution. Save it as a .R file and execute it.

You can also execute your script simply by typing its name in the console, provided it:

  1. Starts with a hashbang line that instructs your system how to interpret it, e.g. #!/usr/bin/env Rscript
  2. Is an executable file e.g. through chmod +x myscriptfile.R.
  • Modify your script and run it without Rscript.
  • Does it work without the path (e.g. ./)?
  • Why?

2 Passing and parsing arguments

It’s unlikely that you would need to run the exact same process over and over again without any change in what data is processed or how it’s processed. One way to control the behaviour of your code is to provide arguments to it. These commonly refer to file names or settings. You can supply arguments after the name of your script where you invoke it. In R, they are available from commandArgs().

  • Let your script print the arguments. Run it with a few extra words or numbers and see what happens.

You can use commandArgs(trailingOnly = TRUE) to suppress the first few items and access your actual arguments.

  • Make your script use the first argument proveded as the mean of the normal distribution.

Processing multiple arguments may become complicated, especially if you want to be able to use C-like long and short flags such as -o outputfile -i inputfile --distribution normal. Packages that support such options include getopt, optparse and argparser.

  • Use the optparse package to modify your script to accept the argument -m or --mean (followed by the value) for mean value.

3 Standard in and out

A convenient feature of command line scripts is the possibility to pipe data from one script to another, thereby avoiding the need for intermediate files. You can use file('stdin') and open() to define and open the connection in R and readLines() to read one or more lines from it.

  • Make your script parse the mean value from a text stream, and run it using the pipe e.g. echo 100 | ./myscriptfile.R.
  • Using the same script, supply the mean from a text file (containing only that) with cat.
  • Why is it not possible to read the input stream using stdin()?

You can pipe your output to another process (any script or tool that accepts a stream) by appending | next_tool_or_script_call to the call, or to a file by appending > filename.

  • What happens if a warning is generated by your script, e.g. with warning('Something is wrong') and you pipe the output to a file?
  • Why?

You can use write(x,file=stderr()) or write(x,file=stdout()) to explicitly divert certain output.

4 Bonus challenges

If you have time, practice by writing an R script that you need in your own work, or select either of the following:

  • Write a script that parses any text stream by line from either a file (when specified) or stdin, and writes to either another file (when specified) or stdout. Each line is written with a certain probability which is also provided as an argument (lines that start with # are always written). Report to stderr the number of lines read and written. Try your script on any fastq, bam/sam, or vcf-formatted data.

  • Write a script that summarizes the content of a table contained in a plain text file. The table is supplied either as a file (when specified) or as a stream. Make sure any lines starting with # are ignored. For speed, do not use more than a maximum of 1000 lines as default, or another number if supplied as an argument. Try your script on any tabular data you hav available.

5 Session Info

  • This document has been created in RStudio using R Markdown and related packages.
  • For R Markdown, see http://rmarkdown.rstudio.com
  • For details about the OS, packages and versions, see detailed information below:
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin16.7.0 (64-bit)
## Running under: macOS High Sierra 10.13.4
## 
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] bsplus_0.1.1    optparse_1.4.4  forcats_0.3.0   stringr_1.3.1  
##  [5] dplyr_0.7.5     purrr_0.2.5     readr_1.1.1     tidyr_0.8.1    
##  [9] tibble_1.4.2    ggplot2_2.2.1   tidyverse_1.2.1 captioner_2.2.3
## [13] bookdown_0.7    knitr_1.20     
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.4 xfun_0.1         reshape2_1.4.3   haven_1.1.1     
##  [5] lattice_0.20-35  tcltk_3.5.0      colorspace_1.3-2 getopt_1.20.2   
##  [9] htmltools_0.3.6  yaml_2.1.19      rlang_0.2.1      pillar_1.2.3    
## [13] foreign_0.8-70   glue_1.2.0       modelr_0.1.2     readxl_1.1.0    
## [17] bindrcpp_0.2.2   bindr_0.1.1      plyr_1.8.4       munsell_0.4.3   
## [21] gtable_0.2.0     cellranger_1.1.0 rvest_0.3.2      psych_1.8.4     
## [25] evaluate_0.10.1  parallel_3.5.0   broom_0.4.4      Rcpp_0.12.17    
## [29] backports_1.1.2  scales_0.5.0     jsonlite_1.5     mnormt_1.5-5    
## [33] hms_0.4.2        digest_0.6.15    stringi_1.2.2    xaringan_0.6    
## [37] grid_3.5.0       rprojroot_1.3-2  cli_1.0.0        tools_3.5.0     
## [41] magrittr_1.5     lazyeval_0.2.1   crayon_1.3.4     pkgconfig_2.0.1 
## [45] xml2_1.2.0       lubridate_1.7.4  rstudioapi_0.7   assertthat_0.2.0
## [49] rmarkdown_1.9    httr_1.3.1       R6_2.2.2         nlme_3.1-137    
## [53] compiler_3.5.0

Page built on: 11-Jun-2018 at 22:37:24.


2018 | SciLifeLab > NBIS > RaukR website twitter