Write programs that do one thing and do it well.
Write programs to work together.
Write programs to handle text streams, because that is a universal interface.
–The UNIX philosophy, Doug McIlroy
In this exercise we will execute R scripts from the command line and provide them with options and data.
While many R users write and execute code interactively (in e.g. RStudio), you probably know that you can run the content of a script simply by writing source(myscriptfile.R)
in the R console. This is also a convenient way to load your custom functions (and this is actually what happens when you load an installed package with library()
).
But once you have code that works you may want to run it routinely without an interactive R environment. R scripts can be executed directly from the command line by typing Rscript myscriptfile.R
.
.R
file and execute it.You can also execute your script simply by typing its name in the console, provided it:
#!/usr/bin/env Rscript
chmod +x myscriptfile.R
.Rscript
../
)?It’s unlikely that you would need to run the exact same process over and over again without any change in what data is processed or how it’s processed. One way to control the behaviour of your code is to provide arguments to it. These commonly refer to file names or settings. You can supply arguments after the name of your script where you invoke it. In R, they are available from commandArgs()
.
You can use commandArgs(trailingOnly = TRUE)
to suppress the first few items and access your actual arguments.
#!/usr/bin/env Rscript
firstarg=as.numeric(commandArgs(trailingOnly = TRUE)[1])
mydata=rnorm(1000,mean = firstarg)
print(summary(mydata))
Processing multiple arguments may become complicated, especially if you want to be able to use C-like long and short flags such as -o outputfile -i inputfile --distribution normal
. Packages that support such options include getopt
, optparse
and argparser
.
optparse
package to modify your script to accept the argument -m
or --mean
(followed by the value) for mean value.#!/usr/bin/env Rscript
suppressPackageStartupMessages(require(optparse)) # don't say "Loading required package: optparse" every time
option_list = list(
make_option(c("-m", "--mean"), default=0)
# you could put the next option here
)
options = parse_args(OptionParser(option_list=option_list))
mean=as.numeric(options$mean)
mydata=rnorm(1000,mean = mean)
print(summary(mydata))
A convenient feature of command line scripts is the possibility to pipe data from one script to another, thereby avoiding the need for intermediate files. You can use file('stdin')
and open()
to define and open the connection in R and readLines()
to read one or more lines from it.
echo 100 | ./myscriptfile.R
.cat
.stdin()
?#!/usr/bin/env Rscript
input_con <- file("stdin")
open(input_con)
oneline <- readLines(con = input_con, n = 1, warn = FALSE)
close(input_con)
mean=as.numeric(oneline)
mydata=rnorm(1000,mean = mean)
print(summary(mydata))
You can pipe your output to another process (any script or tool that accepts a stream) by appending | next_tool_or_script_call
to the call, or to a file by appending > filename
.
warning('Something is wrong')
and you pipe the output to a file?You can use write(x,file=stderr())
or write(x,file=stdout())
to explicitly divert certain output.
If you have time, practice by writing an R script that you need in your own work, or select either of the following:
Write a script that parses any text stream by line from either a file (when specified) or stdin, and writes to either another file (when specified) or stdout. Each line is written with a certain probability which is also provided as an argument (lines that start with # are always written). Report to stderr the number of lines read and written. Try your script on any fastq, bam/sam, or vcf-formatted data.
Write a script that summarizes the content of a table contained in a plain text file. The table is supplied either as a file (when specified) or as a stream. Make sure any lines starting with # are ignored. For speed, do not use more than a maximum of 1000 lines as default, or another number if supplied as an argument. Try your script on any tabular data you hav available.
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin16.7.0 (64-bit)
## Running under: macOS High Sierra 10.13.4
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] bsplus_0.1.1 optparse_1.4.4 forcats_0.3.0 stringr_1.3.1
## [5] dplyr_0.7.5 purrr_0.2.5 readr_1.1.1 tidyr_0.8.1
## [9] tibble_1.4.2 ggplot2_2.2.1 tidyverse_1.2.1 captioner_2.2.3
## [13] bookdown_0.7 knitr_1.20
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.4 xfun_0.1 reshape2_1.4.3 haven_1.1.1
## [5] lattice_0.20-35 tcltk_3.5.0 colorspace_1.3-2 getopt_1.20.2
## [9] htmltools_0.3.6 yaml_2.1.19 rlang_0.2.1 pillar_1.2.3
## [13] foreign_0.8-70 glue_1.2.0 modelr_0.1.2 readxl_1.1.0
## [17] bindrcpp_0.2.2 bindr_0.1.1 plyr_1.8.4 munsell_0.4.3
## [21] gtable_0.2.0 cellranger_1.1.0 rvest_0.3.2 psych_1.8.4
## [25] evaluate_0.10.1 parallel_3.5.0 broom_0.4.4 Rcpp_0.12.17
## [29] backports_1.1.2 scales_0.5.0 jsonlite_1.5 mnormt_1.5-5
## [33] hms_0.4.2 digest_0.6.15 stringi_1.2.2 xaringan_0.6
## [37] grid_3.5.0 rprojroot_1.3-2 cli_1.0.0 tools_3.5.0
## [41] magrittr_1.5 lazyeval_0.2.1 crayon_1.3.4 pkgconfig_2.0.1
## [45] xml2_1.2.0 lubridate_1.7.4 rstudioapi_0.7 assertthat_0.2.0
## [49] rmarkdown_1.9 httr_1.3.1 R6_2.2.2 nlme_3.1-137
## [53] compiler_3.5.0
Page built on: 11-Jun-2018 at 22:37:24.
2018 | SciLifeLab > NBIS > RaukR