Vectorization in R

class: center, middle, inverse, title-slide

# Vectorization in R
## RaukR 2021 • Advanced R for Bioinformatics
### Marcin Kierczak
### NBIS, SciLifeLab

---

exclude: true
count: false

---
name: contents

## Learning Outcomes

By the end of this module, you will:

* understand how to write more efficient loops
* be able to vectorize most loops
* understand how the `apply*` functions work
* be aware of the `purrr` package
* understand what a recursive call is

---
name: for_loop_example
## The simplest of all `for` loops

Say, we want to add 1 to every element of a vector:

```r
vec <- c(1:5)
vec
for (i in vec) {
 vec[i] <- vec[i] + 1
}
vec
```

```
## [1] 1 2 3 4 5
## [1] 2 3 4 5 6
```

Exactly the same can be achieved in R by means of **vectorization**:

```r
vec <- c(1:5)
vec + 1
```

```
## [1] 2 3 4 5 6
```

Which is better than? 😕
---
name: vectorization_benchmark

## Repeating actions &mdash; vectorization

Let us compare the time of execution of the vectorized version (vector with 10,000 elements):

```
##    user  system elapsed 
##   0.005   0.002   0.008
```

to the loop version:

```
##    user  system elapsed 
##   0.055   0.006   0.061
```

---
name: base_vectorize
## Vectorization &mdash; the problem

```r
is_a_droid <- function(x) {
 droids <- c('2-1B', '4-LOM', '8D8', '0-0-0', 'AP-5', 'AZI-3', 'Mister Bones', 'BB-8', 'BB-9E', 'BD-1', 'BT-1', 'C1-10P', 'C-3PO', 'R2-D2')
 if (x %in% droids) {
 return(T)
 } else {
 return(F)
 }
}

test <- c('Anakin', 'Vader', 'R2-D2', 'AZI-3', 'Luke')
is_a_droid(test)
```

```
## [1] FALSE
```

---
name: base_vectorize2
## Vectorization &mdash; the solution(s)

The `base::Vectorize` way:

```r
vectorized_is_a_droid <- base::Vectorize(is_a_droid, vectorize.args = c('x'))
vectorized_is_a_droid(test)
```

```
## Anakin  Vader  R2-D2  AZI-3   Luke 
##  FALSE  FALSE   TRUE   TRUE  FALSE
```

The `apply*` way:

```r
apply(as.matrix(test), FUN = is_a_droid, MARGIN = 1)
```

```
## [1] FALSE FALSE  TRUE  TRUE FALSE
```

```r
lapply(test, FUN=is_a_droid) %>% unlist()
```

```
## [1] FALSE FALSE  TRUE  TRUE FALSE
```

```r
sapply(test, is_a_droid)
```

```
## Anakin  Vader  R2-D2  AZI-3   Luke 
##  FALSE  FALSE   TRUE   TRUE  FALSE
```

---
name: base_vectorize3
## Vectorization &mdash; the solution(s)

The `vapply`:

```r
vapply(test, is_a_droid, FUN.VALUE = TRUE)
```

```
## Anakin  Vader  R2-D2  AZI-3   Luke 
##  FALSE  FALSE   TRUE   TRUE  FALSE
```

```r
vapply(test, is_a_droid, FUN.VALUE = 1)
```

```
## Anakin  Vader  R2-D2  AZI-3   Luke 
##      0      0      1      1      0
```

```r
vapply(test, is_a_droid, FUN.VALUE = c(1,0))
```

```
## Error in vapply(test, is_a_droid, FUN.VALUE = c(1, 0)): values must be length 2,
##  but FUN(X[[1]]) result is length 1
```

```r
vapply(test, is_a_droid, FUN.VALUE = 'a')
```

```
## Error in vapply(test, is_a_droid, FUN.VALUE = "a"): values must be type 'character',
##  but FUN(X[[1]]) result is type 'logical'
```

Or the `purrr` way:

```r
purrr::map(test, is_a_droid) %>% unlist()
```

```
## [1] FALSE FALSE  TRUE  TRUE FALSE
```

---
name: recursion
## Recursion
When we explicitely repeat an action using a loop, we talk about **iteration**. We can also repeat actions by means of **recursion**, i.e. when a function calls itself. Let us implement a factorial `\(!\)`:

```r
factorial.rec <- function(x) {
 if (x == 0 || x == 1)
 return(1)
 else
 return(x * factorial.rec(x - 1)) # Recursive call!
}
factorial.rec(5)
```

```
## [1] 120
```

---
name: rec_eq_iter
## Recursion = iteration?
Yes, every iteration can be converted to recursion (Church-Turing conjecture) and *vice versa*. It is not always obvious, but theoretically it is doable. Let's see how to implement *factorial* in iterative manner:

```r
factorial.iter <- function(x) {
 if (x == 0 || x == 1)
 return(1)
 else {
 tmp <- 1
 for (i in 2:x) {
 tmp <- tmp * i
 }
 return(tmp) 
 }
}
factorial.iter(5)
```

```
## [1] 120
```

---
name: rec_eq_iter_really
## Recursion == iteration, really?

More writing for the iterative version, right? What about the time efficiency?  
The recursive version:

```
##         Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
## 1 factorial.rec(2000)            0.006                  0              4538
```
And the iterative one:

```
##          Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
## 1 factorial.iter(2000)            0.005                  0             12628
```

---
name: loops_avoid_growing
## Loops &mdash; avoid growing data
Avoid changing dimensions of an object inside the loop:

```r
v <- c() # Initialize
for (i in 1:100) {
 v <- c(v, i)
}
```

It is much better to do it like this:

```r
v <- rep(NA, 100) # Initialize with length
for (i in 1:100) {
 v[i] <- i
}
```

---
name: end-slide
class: end-slide, middle
count: false

# Thank you. Questions?

R version 4.0.3 (2020-10-10) Platform: x86_64-apple-darwin17.0 (64-bit)OS: macOS Big Sur 10.16

Built on : 15-Jun-2021 at 15:30:50

2021 • [SciLifeLab](https://www.scilifelab.se/) • [NBIS](https://nbis.se/) • [RaukR](https://nbisweden.github.io/workshop-RaukR-2106/)