Functions and scripts

RaukR 2024 • Advanced R for Bioinformatics

Sebastian DiLorenzo

21-Jun-2024

R Functions

  • Organised, human readable code
  • Any code that will be repeated
  • Add less objects to workspace
  • Perform a set task, preferably that task is not “this whole analysis”

R Functions

Without a function

a <- 5
a + a
[1] 10
b <- 3
b + b
[1] 6
  • User is performing the operation each time

With a function

doubleUp <- function(x){
  x + x
}

a <- 5
doubleUp(a)
b <- 3
doubleUp(b)
z <- doubleUp(3)
[1] 10
[1] 6
  • Function is performing the operation each time

R Functions

The pieces that make a function

function_name <- function(param1, param2 = 20, ...){
  param1*2 # Operational space
  param1+param2 # What is returned. Alt, use return(param1+param2)
}
  • function_name : Name of the function
  • function() : Parameters. User input
    • param1 : No default value. Required.
    • param2 = 20 : Default value
    • ... : ellipses pass other arguments into function
  • function(){} : The function body
  • return : the last line or invoked with return() function.

Tip

How to add a function to your workspace

  • copy paste
  • source() / library()

R scripts as standalone tools

  • Data analysis with R is usually performed interactively using e.g. RStudio
  • Tasks can be executed from the terminal using R scripts
  • R scripts can form powerful standalone tools

Executing an R script

  • Interactively: source("myscript.R") in R console
  • Command line: Rscript myscript.R
  • As executable file: path/myscript.R if:
    • Script is executable: chmod +x myscript.R
    • First line in script is a hashbang e.g. #!/usr/bin/env Rscript
    • Script’s path is included in call or $PATH

Providing arguments to an R script

  • Passing arguments to the script allows for flexibility in settings and input data

./myscript.R inputfile.vcf outputfile.vcf

  • Packages are available that support long and short flags

./myscript.R -i inputfile.vcf -o outputfile.vcf

./myscript.R --input inputfile.vcf --output outputfile.vcf

./myscript.R --output inputfile.vcf --input outputfile.vcf

./myscript.R --output inputfile.vcf -i inputfile.vcf

Parsing arguments - Positional

Example: ./myscript.R inputfile.vcf outputfile.vcf

  • commandArgs()

Use commandArgs() to capture whatever was passed into R as it was executed. To be clear; this is a command that is within the Rscript file.

  • trailingOnly = TRUE

Add trailingOnly = TRUE to suppress the first few items and get the arguments you passed to the script.

commandArgs()
[1] "/usr/local/lib/R/bin/exec/R"        "--no-save"                         
[3] "--no-restore"                       "--no-echo"                         
[5] "--no-restore"                       "--file=/opt/quarto/share/rmd/rmd.R"
commandArgs(trailingOnly = TRUE)
character(0)

Parsing arguments - Flags

Example: ./myscript.R --input inputfile.vcf --output outputfile.vcf

  • Several packages are available: getopt, optparse, argparser, …

Define set of possible arguments at start of script:

library(optparse)
my_options = list(
  make_option(c("-i", "--inputfile"), default='variants.vcf'),
  make_option(c("-o", "--outputfile"), default='variants_filtered.vcf'))

Parse arguments using your definition:

parse_args(OptionParser(option_list=my_options))
$inputfile
[1] "variants.vcf"

$outputfile
[1] "variants_filtered.vcf"

$help
[1] FALSE

Text streams

  • Text streams allow for piping of data through a set of applications without writing intermediate files.

samtools mpileup -uf ref.fa aln.bam | bcftools call -mv | myPythonscript.py | myRscript.R > variants.vcf

Reading

  • To define and open a connection, read one line, and close it:
input_con  <- file("stdin")
open(input_con)
oneline=readLines(input_con, n = 1)
close(input_con)
  • Tidyverse can read a tibble from text stream: read_csv(file("stdin"))

Text streams

Writing

  • Any stdout produced by the code (print(), cat(), etc) can be piped to a new process: ./myRscript.R | myNewScript

  • or written to a file: ./myRscript.R > output.csv

  • To write a tibble as a text stream: cat(format_csv(my_tibble))

Summary

  • Functions are great for organizing code and repeating tasks
  • R scripts are great for performing tasks from command line
  • R scripts can be built in different ways to take arguments or text streams

Thank you! Questions?

         _                  
platform x86_64-pc-linux-gnu
os       linux-gnu          
major    4                  
minor    3.2                

2024 • SciLifeLabNBISRaukR