1 Introduction

In programming languages loop structures, either with or without conditions, are used to repeat commands over multiple entities. For and while loops as well as if-else statements are also often used in R, but not as often as in many other programming languages. The reason for this is that many needs of the loops are addressed using vectorization or via apply functions.

This means that we can multiply all values in a vector in R by two by calling

vec.a <- c(1, 2, 3, 4)
vec.a * 2
## [1] 2 4 6 8

In many other and languages as well as in R, you can also create this with a loop instead

for (i in vec.a) {
  vec.a[i] <- vec.a[i] * 2
}

vec.a
## [1] 2 4 6 8

This is far less efficient and not by any means easier to type and we hence tend to avoid loops when possible.

Extra: Demonstration of time lapse comparison between vectorization and loop

Let us compare the time of execution of the vectorized version (vector with 10,000 elements):

vec <- c(1:1e6)
ptm <- proc.time()
vec <- vec + 1
proc.time() - ptm # vectorized
##    user  system elapsed 
##   0.001   0.000   0.002

to the loop version:

vec <- c(1:1e6)
ptm <- proc.time()
for (i in vec) {
  vec[i] <- vec[i] + 1
}
proc.time() - ptm # for-loop
##    user  system elapsed 
##   0.059   0.000   0.059


After this exercise you should know:

  • What are the most common loop structures in R
  • Some common alternatives to using loops in R
  • How one can convert a short script to a function.
  • Use that new function in R.

2 Exercises

  1. Create a 100000 by 10 matrix with the numbers 1:1000000. Make a for-loop that calculates the sum for each row of the matrix. Verify that your results are consistent with what you obtain with the apply() function to calculate row sums as well as with the built-in rowSums() function. These functions were discussed in the lecture Elements of the programming language - part 2.

X <- matrix(1:1000000, nrow = 100000, ncol = 10)
for.sum <- vector()
# Note that this loop is much faster if you outside the loop create an empty vector of the right size.
# rwmeans <- vector('integer', 100000)
for (i in 1:nrow(X)) {
    for.sum[i] <- sum(X[i,])
}
head(for.sum)
## [1] 4500010 4500020 4500030 4500040 4500050 4500060
app.sum <- apply(X, MARGIN = 1, sum)
head(app.sum)
## [1] 4500010 4500020 4500030 4500040 4500050 4500060
rowSums.sum <- rowSums(X)
head(rowSums.sum)
## [1] 4500010 4500020 4500030 4500040 4500050 4500060
identical(for.sum, app.sum)
## [1] TRUE
identical(for.sum, rowSums.sum)
## [1] FALSE
identical(for.sum, as.integer(rowSums.sum))
## [1] TRUE
  1. As you saw in the lecture, another common loop structure that is used is the while loop, which functions much like a for loop, but will only run as long as a test condition is TRUE. Modify your for loop from exercise 1 and make it into a while loop.
    Tip? You can use the number of rows in the matrix as a condition for the while loop.

x <- 1
while.sum <- vector("integer", 100000)
while (x < 100000) {
  while.sum[x] <- sum(X[x,])
  x <- x + 1
}
head(while.sum)
## [1] 4500010 4500020 4500030 4500040 4500050 4500060
  1. Create a data frame with two numeric and one character vector. Write a loop that loops over the columns and reports the sum of the column values if it is numeric and the total number of characters if it is a character vector.
    Tip? To count number of characters, you can use nchar function.
  • Do it first using only if clauses

vector1 <- 1:10
vector2 <- c("Odd", "Loop", letters[1:8])
vector3 <- rnorm(10, sd = 10)
dfr1 <- data.frame(vector1, vector2, vector3, stringsAsFactors = FALSE)
sum.vec <- vector()
for(i in 1:ncol(dfr1)) {
  if (is.numeric(dfr1[,i])) {
      sum.vec[i] <- sum(dfr1[,i])
  } 
  if (is.character(dfr1[,i])) {
      sum.vec[i] <- sum(nchar(dfr1[,i]))
  }
}
sum.vec
## [1] 55.000000 15.000000  4.655531
  • And then modify the loop to use if-else clauses

sum.vec <- vector()
for(i in 1:ncol(dfr1)) {
  if (is.numeric(dfr1[,i])) {
      sum.vec[i] <- sum(dfr1[,i])
  } else {
      sum.vec[i] <- sum(nchar(dfr1[,i]))
  }
}
sum.vec
## [1] 55.000000 15.000000  4.655531
  1. In question 3 you generated a loop to go over a data frame. Try to convert this code to a function in R. The function should take a single data frame name as argument.

dfr.info <- function(dfr) {
  sum.vec <- vector()
  for (i in 1:ncol(dfr)) {
    if (is.numeric(dfr[,i])) {
        sum.vec[i] <- sum(dfr[,i])
    } else {
        sum.vec[i] <- sum(nchar(dfr[,i]))
    }
  }
  sum.vec
}
#Execute the function

dfr.info(dfr1)
## [1] 55.000000 15.000000  4.655531
  1. Extra exercise-loops. A variation of exercise 3: Create a data frame with three columns: one numeric, one logical, and one character vector. Write a loop that loops over the columns and reports the sum of the column values if it is numeric, the total number of TRUEs when is logical and the total number of characters if it is a character vector.
    Tips? To count number of characters, you can use nchar function.
    To count number of TRUE values, you can use sum function.
    Inside the loop, you may want to use the if-else-if structure.

vector1 <- 1:10
vector2 <- c("Odd", "Loop", letters[1:8])
vector3 <- c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE)
dfr2 <- data.frame(vector1, vector2, vector3, stringsAsFactors = FALSE)
sum.vec <- vector()
for(i in 1:ncol(dfr2)) {
  if (is.numeric(dfr2[,i])) {
      sum.vec[i] <- sum(dfr2[,i])
  } else if (is.logical(dfr2[,i])) {
      sum.vec[i] <- sum(dfr2[,i])
  } else {
      sum.vec[i]<-sum(nchar(dfr2[,i]))
  }
}
sum.vec
## [1] 55 15  6
  1. Extra exercise-functions. Create a function that will convert time in hours to time in minutes. The function should take a single argument, a numeric value of hours, and return a numeric value of minutes. Add an if clause inside of the function that will stop running the function and give an error if the hours provided are negative.
    Tip? You can read the documentation of the stop() function by typing ?stop in the R console.

hours_to_mins <- function(hours) {
  if (hours < 0) {
    stop("Hours cannot be negative")
  }
  minutes <- hours * 60
  return(minutes)
}

hours_to_mins(3.2)
#Now test it with a negative hour value
hours_to_mins(-3.26)
## Error in hours_to_mins(-3.26): Hours cannot be negative
## [1] 192
  1. In all loops that we tried out we have created the variable where the output is saved outside the loop. Why is this?

3 Extra material

Do you want to expand on loops, if-else clauses, and functions? Here a bit more extra material!

Extra material: The switch function

If-else clauses operate on logical values. What if we want to take decisions based on non-logical values? Well, if-else will still work by evaluating a number of comparisons, but we can also use switch:

switch.demo <- function(x) {
  switch(class(x),
         logical = cat('logical\n'),
         numeric = cat('Numeric\n'),
         factor = cat('Factor\n'),
         cat('Undefined\n')
         )
}
switch.demo(x=TRUE)
switch.demo(x=15)
switch.demo(x=factor('a'))
switch.demo(data.frame())
## logical
## Numeric
## Factor
## Undefined
Extra material: The ellipsis argument trick

What if the authors of, e.g. plot.something wrapper forgot about the ...?

my.plot <- function(x, y) { # Passing downstrem
  plot(x, y, las=1, cex.axis=.8, ...)
}
formals(my.plot) <- c(formals(my.plot), alist(... = ))
my.plot(1, 1, col='red', pch=19)

Extra material: Infix notations

Operators like +, - or * are using the so-called infix functions, where the function name is between arguments. We can define our own:

`%p%` <- function(x, y) {
  paste(x,y)
}
'a' %p% 'b'
## [1] "a b"