class: center, middle, inverse, title-slide # Vectorization in R ## RaukR 2021 • Advanced R for Bioinformatics ###
Marcin Kierczak
### NBIS, SciLifeLab --- exclude: true count: false <link href="https://fonts.googleapis.com/css?family=Roboto|Source+Sans+Pro:300,400,600|Ubuntu+Mono&subset=latin-ext" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous"> <!-- ----------------- Only edit title & author above this ----------------- --> --- name: contents ## Learning Outcomes By the end of this module, you will: * understand how to write more efficient loops * be able to vectorize most loops * understand how the `apply*` functions work * be aware of the `purrr` package * understand what a recursive call is --- name: for_loop_example ## The simplest of all `for` loops Say, we want to add 1 to every element of a vector: ```r vec <- c(1:5) vec for (i in vec) { vec[i] <- vec[i] + 1 } vec ``` ``` ## [1] 1 2 3 4 5 ## [1] 2 3 4 5 6 ``` -- Exactly the same can be achieved in R by means of **vectorization**: ```r vec <- c(1:5) vec + 1 ``` ``` ## [1] 2 3 4 5 6 ``` Which is better than? 😕 --- name: vectorization_benchmark ## Repeating actions — vectorization Let us compare the time of execution of the vectorized version (vector with 10,000 elements): ``` ## user system elapsed ## 0.005 0.002 0.008 ``` -- to the loop version: ``` ## user system elapsed ## 0.055 0.006 0.061 ``` --- name: base_vectorize ## Vectorization — the problem ```r is_a_droid <- function(x) { droids <- c('2-1B', '4-LOM', '8D8', '0-0-0', 'AP-5', 'AZI-3', 'Mister Bones', 'BB-8', 'BB-9E', 'BD-1', 'BT-1', 'C1-10P', 'C-3PO', 'R2-D2') if (x %in% droids) { return(T) } else { return(F) } } test <- c('Anakin', 'Vader', 'R2-D2', 'AZI-3', 'Luke') is_a_droid(test) ``` ``` ## [1] FALSE ``` --- name: base_vectorize2 ## Vectorization — the solution(s) The `base::Vectorize` way: ```r vectorized_is_a_droid <- base::Vectorize(is_a_droid, vectorize.args = c('x')) vectorized_is_a_droid(test) ``` ``` ## Anakin Vader R2-D2 AZI-3 Luke ## FALSE FALSE TRUE TRUE FALSE ``` -- The `apply*` way: ```r apply(as.matrix(test), FUN = is_a_droid, MARGIN = 1) ``` ``` ## [1] FALSE FALSE TRUE TRUE FALSE ``` -- ```r lapply(test, FUN=is_a_droid) %>% unlist() ``` ``` ## [1] FALSE FALSE TRUE TRUE FALSE ``` -- ```r sapply(test, is_a_droid) ``` ``` ## Anakin Vader R2-D2 AZI-3 Luke ## FALSE FALSE TRUE TRUE FALSE ``` --- name: base_vectorize3 ## Vectorization — the solution(s) The `vapply`: ```r vapply(test, is_a_droid, FUN.VALUE = TRUE) ``` ``` ## Anakin Vader R2-D2 AZI-3 Luke ## FALSE FALSE TRUE TRUE FALSE ``` -- ```r vapply(test, is_a_droid, FUN.VALUE = 1) ``` ``` ## Anakin Vader R2-D2 AZI-3 Luke ## 0 0 1 1 0 ``` -- ```r vapply(test, is_a_droid, FUN.VALUE = c(1,0)) ``` ``` ## Error in vapply(test, is_a_droid, FUN.VALUE = c(1, 0)): values must be length 2, ## but FUN(X[[1]]) result is length 1 ``` ```r vapply(test, is_a_droid, FUN.VALUE = 'a') ``` ``` ## Error in vapply(test, is_a_droid, FUN.VALUE = "a"): values must be type 'character', ## but FUN(X[[1]]) result is type 'logical' ``` -- Or the `purrr` way: ```r purrr::map(test, is_a_droid) %>% unlist() ``` ``` ## [1] FALSE FALSE TRUE TRUE FALSE ``` --- name: recursion ## Recursion When we explicitely repeat an action using a loop, we talk about **iteration**. We can also repeat actions by means of **recursion**, i.e. when a function calls itself. Let us implement a factorial `\(!\)`: ```r factorial.rec <- function(x) { if (x == 0 || x == 1) return(1) else return(x * factorial.rec(x - 1)) # Recursive call! } factorial.rec(5) ``` ``` ## [1] 120 ``` --- name: rec_eq_iter ## Recursion = iteration? Yes, every iteration can be converted to recursion (Church-Turing conjecture) and *vice versa*. It is not always obvious, but theoretically it is doable. Let's see how to implement *factorial* in iterative manner: ```r factorial.iter <- function(x) { if (x == 0 || x == 1) return(1) else { tmp <- 1 for (i in 2:x) { tmp <- tmp * i } return(tmp) } } factorial.iter(5) ``` ``` ## [1] 120 ``` --- name: rec_eq_iter_really ## Recursion == iteration, really? More writing for the iterative version, right? What about the time efficiency? The recursive version: ``` ## Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB ## 1 factorial.rec(2000) 0.006 0 4538 ``` And the iterative one: ``` ## Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB ## 1 factorial.iter(2000) 0.005 0 12628 ``` --- name: loops_avoid_growing ## Loops — avoid growing data Avoid changing dimensions of an object inside the loop: ```r v <- c() # Initialize for (i in 1:100) { v <- c(v, i) } ``` -- It is much better to do it like this: ```r v <- rep(NA, 100) # Initialize with length for (i in 1:100) { v[i] <- i } ``` -- <!-- --------------------- Do not edit this and below --------------------- --> --- name: end-slide class: end-slide, middle count: false # Thank you. Questions? <p>R version 4.0.3 (2020-10-10)<br><p>Platform: x86_64-apple-darwin17.0 (64-bit)</p><p>OS: macOS Big Sur 10.16</p><br> Built on : <i class='fa fa-calendar' aria-hidden='true'></i> 15-Jun-2021 at <i class='fa fa-clock-o' aria-hidden='true'></i> 15:30:50 <b>2021</b> • [SciLifeLab](https://www.scilifelab.se/) • [NBIS](https://nbis.se/) • [RaukR](https://nbisweden.github.io/workshop-RaukR-2106/)