class: center, middle, inverse, title-slide .title[ # Replication, Control Structures & Functions ] .subtitle[ ## Elements of the R language ] .author[ ### Marcin Kierczak and Nima Rafati ] --- exclude: true count: false <link href="https://fonts.googleapis.com/css?family=Roboto|Source+Sans+Pro:300,400,600|Ubuntu+Mono&subset=latin-ext" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous"> --- name: contents # Contents of the lecture - variables and their types - operators - vectors - numbers as vectors - strings as vectors - matrices - lists - data frames - objects - **repeating actions: iteration and recursion** - **decision taking: control structures** - **functions in general** - **variable scope** - **base functions** --- name: repeating_actions_1 # Repeating actions Somtimes you want to repeat certain action several times. There are few alternatives in R, for example: - `for` loop - `while` loop --- name: for_loop_0 # Repeating actions — for loop One way to repeat an action is to use the **for-loop** ```r for (var in seq) { expr } ``` --- name: for_loop_1 # Repeating actions — for loop Example. ```r for (i in 1:5) { cat(paste('Performing operation no.', i), '\n') } ``` ``` ## Performing operation no. 1 ## Performing operation no. 2 ## Performing operation no. 3 ## Performing operation no. 4 ## Performing operation no. 5 ``` -- A slight modification of the above example will skip odd indices. ```r for (i in c(2,4,6,8,10)) { cat(paste('Performing operation no.', i), '\n') } ``` ``` ## Performing operation no. 2 ## Performing operation no. 4 ## Performing operation no. 6 ## Performing operation no. 8 ## Performing operation no. 10 ``` --- name: for_loop_counter # Repeating actions — for loop with a counter Sometimes, we also want an external counter: ```r cnt <- 1 for (i in c(2,4,6,8,10)) { cat(paste('Performing operation no.', cnt, 'on element', i), '\n') cnt <- cnt + 1 } ``` ``` ## Performing operation no. 1 on element 2 ## Performing operation no. 2 on element 4 ## Performing operation no. 3 on element 6 ## Performing operation no. 4 on element 8 ## Performing operation no. 5 on element 10 ``` --- name: for_loop_example # Repeating actions — for loop, an example Say, we want to add 1 to every element of a vector: ```r vec <- c(1:5) vec for (i in vec) { vec[i] <- vec[i] + 1 } vec ``` ``` ## [1] 1 2 3 4 5 ## [1] 2 3 4 5 6 ``` -- The above can be achieved in R by means of **vectorization**. **Vectorization** is an element-wise operation where you perform task/operation on entire vectors/matrix/data.frames. ```r vec <- c(1:5) vec + 1 ``` ``` ## [1] 2 3 4 5 6 ``` --- name: vectorization_benchmark # Repeating actions — vectorization Let us compare the time of execution of the vectorized version (vector with 10,000 elements): ```r vec <- c(1:1e6) ptm <- proc.time() vec <- vec + 1 proc.time() - ptm # vectorized ``` ``` ## user system elapsed ## 0.004 0.000 0.004 ``` -- to the loop version: ```r vec <- c(1:1e6) ptm <- proc.time() for (i in vec) { vec[i] <- vec[i] + 1 } proc.time() - ptm # for-loop ``` ``` ## user system elapsed ## 0.074 0.004 0.079 ``` --- name: while_loop # Repeating actions — the while loop There is also another type of loop in R, the **while loop** which is executed as long as some condition is true. ```r x <- 1 while (x < 5) { cat(x, " ... ") x <- x + 1 } ``` ``` ## 1 ... 2 ... 3 ... 4 ... ``` --- name: recursion <!-- # Recursion When we explicitely repeat an action using a loop, we talk about **iteration**. We can also repeat actions by means of **recursion**, i.e. when a function calls itself. Let us implement a factorial `\(!\)`: ```r factorial.rec <- function(x) { if (x == 0 || x == 1) return(1) else return(x * factorial.rec(x - 1)) # Recursive call! } factorial.rec(5) ``` ``` ## [1] 120 ``` # Recursion = iteration? Yes, every iteration can be converted to recursion (Church-Turing conjecture) and vice-versa. It is not always obvious, but theoretically it is doable. Let's see how to implement *factorial* in iterative manner: ```r factorial.iter <- function(x) { if (x == 0 || x == 1) return(1) else { tmp <- 1 for (i in 2:x) { tmp <- tmp * i } return(tmp) } } factorial.iter(5) ``` ``` ## [1] 120 ``` # Recursion == iteration, really? More writing for the iterative version, right? What about the time efficiency? The recursive version: ```r ptm <- proc.time() factorial.rec(20) proc.time() - ptm ``` ``` ## [1] 2.432902e+18 ## user system elapsed ## 0.002 0.000 0.002 ``` And the iterative one: ```r ptm <- proc.time() factorial.iter(20) proc.time() - ptm ``` ``` ## [1] 2.432902e+18 ## user system elapsed ## 0.008 0.000 0.008 ``` --> --- name: loops_avoid_growing # Loops — avoid growing data Avoid changing dimensions of an object inside the loop: ```r v <- c() # Initialize for (i in 1:100) { v <- c(v, i) } ``` -- It is much better to do it like this: ```r v <- rep(NA, 100) # Initialize with length for (i in 1:100) { v[i] <- i } ``` -- Always try to know the size of the object you are going to create! --- name: if_clause # Decisions, an if-clause Often, one has to take a different course of action depending on a flow of the algorithm. Let's print only odd numbers `\([1, 10]\)`: ```r v <- 1:10 for (i in v) { if (i %% 2 != 0) { # if clause cat(i, ' ') } } ``` ``` ## 1 3 5 7 9 ``` --- name:if_else # Decisions — if-else If we want to print 'o' for an odd number and 'e' for an even, we could write either of: .pull-left-50[ ```r v <- 1:10 for (i in v) { if (i %% 2 != 0) { # if clause cat('o ') } if (i %% 2 == 0) { # another if-clause cat('e ') } } ``` ``` ## o e o e o e o e o e ``` ] -- .pull-right-50[ ```r v <- 1:10 for (i in v) { if (i %% 2 != 0) { # if clause cat('o ') } else { # another if-clause cat('e ') } } ``` ``` ## o e o e o e o e o e ``` ] -- .pull-left-50[ ```r v <- 1:10 for (i in v) { tmp <- 'e ' # set default to even if (i %% 2 != 0) { # if clause tmp <- 'o ' # change default for odd numbers } cat(tmp) } ``` ``` ## o e o e o e o e o e ``` ] -- Each of these three ways are equally good and are mainly the matter of style... --- name: elif # Decision taking — more alternatives So far, so good, but we were only dealing with 3 alternatives. Let's say that we want to print '?' for zero, 'e' for even and 'o' for an odd number: ```r v <- c(0:10) for (i in v) { if (i == 0) { cat('? ') } else if (i %% 2 != 0) { # if clause cat('o ') } else { # another if-clause cat('e ') } } ``` ``` ## ? o e o e o e o e o e ``` Congratulations! You have just learned the **if-elseif-else** clause. --- name: switch # Switch If-else clauses operate on logical values. What if we want to take decisions based on non-logical values? Well, if-else will still work by evaluating a number of comparisons, but we can also use **switch**: ```r switch.demo <- function(x) { switch(class(x), logical = cat('logical\n'), numeric = cat('Numeric\n'), factor = cat('Factor\n'), cat('Undefined\n') ) } switch.demo(x=TRUE) switch.demo(x=15) switch.demo(x=factor('a')) switch.demo(data.frame()) ``` ``` ## logical ## Numeric ## Factor ## Undefined ``` --- name: fns # Functions Often, it is really handy to re-use some code we have written or to pack together the code that is doing some task. Functions are a really good way to do this in R: ```r add.one <- function(arg1) { arg1 <- arg1 + 1 return(arg1) } add.one(1) add.one() ``` ``` ## Error in add.one(): argument "arg1" is missing, with no default ``` ``` ## [1] 2 ``` --- name:anatomy_of_a_fn # Anatomy of a function A function consists of: *formal arguments*, *function body* and *environment*: ```r formals(add.one) body(add.one) environment(add.one) environment(sd) ``` ``` ## $arg1 ## ## ## { ## arg1 <- arg1 + 1 ## return(arg1) ## } ## <environment: R_GlobalEnv> ## <environment: namespace:stats> ``` --- name: fns_defaults # Functions — default values Sometimes, it is good to use default values for some arguments: ```r add.a.num <- function(arg, num=1) { arg <- arg + num return(arg) } add.a.num(1, 5) add.a.num(1) # skip the num argument add.a.num(num=1) # skip the first argument ``` ``` ## Error in add.a.num(num = 1): argument "arg" is missing, with no default ``` ``` ## [1] 6 ## [1] 2 ``` --- name:fns_args # Functions — order of arguments ```r args.demo <- function(x, y, arg3) { print(paste('x =', x, 'y =', y, 'arg3 =', arg3)) } args.demo(1,2,3) args.demo(x=1, y=2, arg3=3) args.demo(x=1, 2, 3) args.demo(arg3=3, x=1, y=2) ``` ``` ## [1] "x = 1 y = 2 arg3 = 3" ## [1] "x = 1 y = 2 arg3 = 3" ## [1] "x = 1 y = 2 arg3 = 3" ## [1] "x = 1 y = 2 arg3 = 3" ``` <!-- -- ```r args.demo2 <- function(x, arg2, arg3) { print(paste('x =', x, 'arg2 =', arg2, 'arg3 =', arg3)) } #args.demo2(x=1, y=2, ar=3) ``` --> --- name: variable_scope # Functions — variable scope .pull-left-50[ Functions 'see' not only what has been passed to them as arguments: ```r x <- 7 y <- 3 xyplus <- function(x) { x <- x + y return(x) } xyplus(x) x ``` ``` ## [1] 10 ## [1] 7 ``` ] -- .pull-right-50[ Everything outside the function is called **global environment**. There is a special operator `<<-` for working on global environment: ```r x <- 1 xplus <- function(x) { x <<- x + 1 } xplus(x) x xplus(x) x ``` ``` ## [1] 2 ## [1] 3 ``` ] --- name: fns_ellipsis # Functions — the `...` argument There is a special argument **...** (ellipsis) which allows you to give any number of arguments or pass arguments downstream: ```r # Any number of arguments my.plot <- function(x, y, ...) { # Passing downstream plot(x, y, las=1, cex.axis=.8, ...) } {par(mfrow=c(1,2),mar=c(4,4,1,1)) my.plot(1,1) my.plot(1, 1, col='red', pch=19)} ``` <img src="slide_r_elements_4_files/figure-html/fns.3dots-1.png" width="432" style="display: block; margin: auto auto auto 0;" /> - A function enclosing a function is a **wrapper function** --- name: ellipsis_trick # Functions — the ellipsis argument trick What if the authors of, e.g. plot.something wrapper forgot about the `...`? ```r my.plot <- function(x, y) { # Passing downstrem plot(x, y, las=1, cex.axis=.8, ...) } formals(my.plot) <- c(formals(my.plot), alist(... = )) my.plot(1, 1, col='red', pch=19) ``` <img src="slide_r_elements_4_files/figure-html/fns.3dots.trick-1.png" width="360" style="display: block; margin: auto auto auto 0;" /> --- <!-- name: lazy_eval # R is lazy! In R, arguments are evaluated as late as possible, i.e. when they are needed. This is **lazy evaluation**: ```r h <- function(a = 1, b = d) { d <- (a + 1) ^ 2 c(a, b) } #h() ``` > The above won't be possible in, e.g. C where values of both arguments have to be known before calling a function **eager evaluation**. --> --- name: everything_is_a_fn # In R everything is a function Because in R everything is a function ```r `+` ``` ``` ## function (e1, e2) .Primitive("+") ``` -- we can re-define things like this: -- ```r `+` <- function(e1, e2) { e1 - e2 } 2 + 2 ``` ``` ## [1] 0 ``` -- and, finally, clean up the mess... -- ```r rm("+") 2 + 2 ``` ``` ## [1] 4 ``` --- name: infix_fns # Infix notation Operators like `+`, `-` or `*` are using the so-called **infix** functions, where the function name is between arguments. We can define our own: ```r `%p%` <- function(x, y) { paste(x,y) } 'a' %p% 'b' ``` ``` ## [1] "a b" ``` --- name: base_fns # Base functions When we start R, the following packages are pre-loaded automatically: ```r # .libPaths() # get library location # library() # see all packages installed search() # see packages currently loaded ``` ``` ## [1] ".GlobalEnv" "package:vcd" "package:grid" ## [4] "package:patchwork" "package:nycflights13" "package:readxl" ## [7] "package:tidyr" "package:ggplot2" "package:formattable" ## [10] "package:kableExtra" "package:dplyr" "package:lubridate" ## [13] "package:leaflet" "package:yaml" "package:fontawesome" ## [16] "package:bookdown" "package:knitr" "package:stats" ## [19] "package:graphics" "package:grDevices" "package:utils" ## [22] "package:datasets" "package:methods" "Autoloads" ## [25] "package:base" ``` Check what basic functions are offered by packages: *base*, *utils* and we will soon work with package *graphics*. If you want to see what statistical functions are in your arsenal, check out package *stats*. <!-- --------------------- Do not edit this and below --------------------- --> --- name: end_slide class: end-slide, middle count: false # See you at the next lecture! .end-text[ <p class="smaller"> <span class="small" style="line-height: 1.2;">Graphics from </span><img src="./assets/freepik.jpg" style="max-height:20px; vertical-align:middle;"><br> Created: 25-Oct-2023 • <a href="https://www.scilifelab.se/">SciLifeLab</a> • <a href="https://nbis.se/">NBIS</a> </p> ]