Vectorization

RaukR 2023 • Advanced R for Bioinformatics

Speed up R code using vectorized functions.
Author

Marcin Kierczak

Published

21-Jun-2024

Note

In programming languages loop structures, either with or without conditions, are used to repeat commands over multiple entities. For and while loops as well as if-else statements are also often used in R, but perhaps not as often as in many other programming languages. The reason for this is that in R, there is an alternative called vectorization which usually is more efficient.

Vectorization implies that we can multiply all values in a vector in R by two by calling:

vec.a <- c(1, 2, 3, 4)
vec.a * 2
[1] 2 4 6 8

In many other and languages as well as in R, you can also create this with a loop instead

for (i in vec.a) {
  vec.a[i] <- vec.a[i] * 2
}

vec.a
[1] 2 4 6 8

As you saw in the lecture, this is far less efficient and not by any means easier to type and we hence tend to avoid loops when possible.

1 Task 1: A large matrix

1.1 Looping over a matrix

  • Create a 100000 by 10 matrix with the numbers 1:1000000
  • Write a for-loop that calculates the sum for each row of the matrix.
  • Verify that your row counts are consistent with what you obtain with the apply() function
  • Verify that your row counts are consistent with what you obtain with the apply() function rowSums() function
Code
X <- matrix(1:1000000, nrow = 100000, ncol = 10)
for.sum <- vector()
# Note that this loop is much faster if you outside the loop create an empty vector of the right size.
# rwmeans <- vector('integer', 100000)
for (i in 1:nrow(X)) {
    for.sum[i] <- sum(X[i,])
}
head(for.sum)
[1] 4500010 4500020 4500030 4500040 4500050 4500060
Code
app.sum <- apply(X, MARGIN = 1, sum)
head(app.sum)
[1] 4500010 4500020 4500030 4500040 4500050 4500060
Code
rowSums.sum <- rowSums(X)
head(rowSums.sum)
[1] 4500010 4500020 4500030 4500040 4500050 4500060
Code
identical(for.sum, app.sum)
[1] TRUE
Code
identical(for.sum, rowSums.sum)
[1] FALSE
Code
identical(for.sum, as.integer(rowSums.sum))
[1] TRUE

2 Task 2: Fibonacci sequence

During the lecture an approach to calculate factorials was implemented using recursion (function calling itself). Here we should use recursion to generate a sequence of Fibonacci numbers. A Fibonacci number is part of a series of number with the following properties:

  • the first two numbers in the Fibonacci sequence are either 1 and 1, or 0 and 1 (depending on the chosen starting point)
  • each subsequent number is the sum of the previous two. Hence:

0, 1, 1, 2, 3, 5, 8, 13, 21, ...

or

1, 1, 2, 3, 5, 8, 13, 21, ...

2.1 N-th Fibonacci number

Write a function that generates Fibonacci number using a recursive approach.

Code
fib_rec <- function(n) {
    if (n == 0 || n == 1) { 
        return(n) 
    } else {
        return(fib_rec(n - 1) + fib_rec(n - 2))
    }
}

2.2 Generate Fibonacci sequence

Generate Fibonacci numbers from 0 to 10 using *apply* approach.

Code
sapply(0:10, FUN = fib_rec)
 [1]  0  1  1  2  3  5  8 13 21 34 55

2.3 Vectorized Fibonacci generator

Vectorize your Fibonacci number generating function.

Code
vec_fib_rec <- Vectorize(fib_rec)
vec_fib_rec(0:10)
 [1]  0  1  1  2  3  5  8 13 21 34 55