Vectorization in R

RaukR 2024 β€’ Advanced R for Bioinformatics

Marcin Kierczak

21-Jun-2024

Learning Outcomes

By the end of this module, you will:

  • understand how to write more efficient loops
  • be able to vectorize most loops
  • understand how the apply* functions work
  • be aware of the purrr package
  • understand what a recursive call is

The simplest of all for loops

Say, we want to add 1 to every element of a vector:

vec <- c(1:5)
vec
for (i in vec) {
  vec[i] <- vec[i] + 1
}
vec
[1] 1 2 3 4 5
[1] 2 3 4 5 6

Exactly the same can be achieved in R by means of vectorization:

vec <- c(1:5)
vec + 1
[1] 2 3 4 5 6

Which is better? πŸ˜•

Repeating actions β€” vectorization

Let us compare the time of execution of the vectorized version (vector with 10,000 elements):

vec <- c(1:1e6)
peakRAM::peakRAM(vec <- vec + 1)
Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
vec<-vec+1 0.004 3.9 7.7

to the loop version:

vec <- c(1:1e6)
loop <- function(vec) {
  for (i in vec) {
    vec[i] <- vec[i] + 1
  }
  return(vec)
}
peakRAM::peakRAM(loop(vec))
Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
loop(vec) 0.169 7.6 7.7

Vectorization β€” the problem

is_a_droid <- function(x) {
  droids <- c('2-1B', '4-LOM', '8D8', '0-0-0', 'AP-5', 'AZI-3', 'Mister Bones', 'BB-8', 'BB-9E', 'BD-1', 'BT-1', 'C1-10P', 'C-3PO', 'R2-D2')
  if (x %in% droids) {
    return(T)
  } else {
    return(F)
  }
}

test <- c('Anakin', 'Vader', 'R2-D2', 'AZI-3', 'Luke')
is_a_droid(test)
Error in if (x %in% droids) {: the condition has length > 1

Vectorization β€” the solution(s)

The base::Vectorize way:

vectorized_is_a_droid <- base::Vectorize(is_a_droid, vectorize.args = c('x'))
vectorized_is_a_droid(test)
Anakin  Vader  R2-D2  AZI-3   Luke 
 FALSE  FALSE   TRUE   TRUE  FALSE 

The apply* way:

apply(as.matrix(test), FUN = is_a_droid, MARGIN = 1)
[1] FALSE FALSE  TRUE  TRUE FALSE
lapply(test, FUN=is_a_droid) %>% unlist() # list apply
[1] FALSE FALSE  TRUE  TRUE FALSE
sapply(test, is_a_droid) # simplified lapply
Anakin  Vader  R2-D2  AZI-3   Luke 
 FALSE  FALSE   TRUE   TRUE  FALSE 

Vectorization β€” the solution(s)

The vapply:

vapply(test, is_a_droid, FUN.VALUE = TRUE) # value type-safe sapply 
vapply(test, is_a_droid, FUN.VALUE = 1)
Anakin  Vader  R2-D2  AZI-3   Luke 
 FALSE  FALSE   TRUE   TRUE  FALSE 
Anakin  Vader  R2-D2  AZI-3   Luke 
     0      0      1      1      0 
vapply(test, is_a_droid, FUN.VALUE = 'a')
Error in vapply(test, is_a_droid, FUN.VALUE = "a"): values must be type 'character',
 but FUN(X[[1]]) result is type 'logical'

Or the purrr way:

purrr::map(test, is_a_droid) %>% unlist()
[1] FALSE FALSE  TRUE  TRUE FALSE

Recursion

When we explicitly repeat an action using a loop, we talk about iteration. We can also repeat actions by means of recursion, i.e. when a function calls itself. Let us implement a factorial \(!\):

factorial_rec <- function(x) {
  if (x == 0 || x == 1)
    return(1)
  else
    return(x * factorial_rec(x - 1)) # Recursive call!
}
factorial_rec(5)
[1] 120

To recurse or to iterate?

Comparing recursion to iteration is like comparing a phillips head screwdriver to a flat head screwdriver. For the most part you could remove any phillips head screw with a flat head, but it would just be easier if you used the screwdriver designed for that screw right?

Some algorithms just lend themselves to recursion because of the way they are designed (Fibonacci sequences, traversing a tree like structure, etc.). Recursion makes the algorithm more succinct and easier to understand (therefore shareable and reusable)

– StackOverflow

Loops β€” avoid growing data

Avoid changing dimensions of an object inside the loop:

v <- c() # Initialize
for (i in 1:100) {
  v <- c(v, i)
}

It is much better to do it like this:

v <- rep(NA, 100) # Initialize with length
for (i in 1:100) {
  v[i] <- i
}

Thank you! Questions?

         _                  
platform x86_64-pc-linux-gnu
os       linux-gnu          
major    4                  
minor    3.2                

2024 β€’ SciLifeLab β€’ NBIS β€’ RaukR

Recursion = iteration?

Yes, every iteration can be converted to recursion (Church-Turing conjecture) and vice versa. It is not always obvious, but theoretically it is doable. Let’s see how to implement factorial in iterative manner:

factorial_iter <- function(x) {
  if (x == 0 || x == 1)
    return(1)
  else {
    tmp <- 1
    for (i in 2:x) {
      tmp <- tmp * i
    }
    return(tmp)  
  }
}
factorial_iter(5)
[1] 120

Recursion == iteration, really?

More writing for the iterative version, right? What about the time efficiency?
The recursive version:

peakRAM::peakRAM(factorial_rec(1000))

And the iterative one:

peakRAM::peakRAM(factorial_iter(1000))
Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
factorial_iter(1000) 0.007 0 0.1