In programming languages loop structures, either with or without conditions, are used to repeat commands over multiple entities. for and while loops as well as if-else statements are also often used in R, but not as often as in many other programming languages. The reason for this is that many needs of the loops are already addressed using vectorization or via apply functions.
This means that we can multiply all values in a vector in R by two by calling
vec.a <- c(1, 2, 3, 4)
vec.a * 2
## [1] 2 4 6 8
In many other and languages as well as in R, you can also create this with a loop instead
for (i in seq_along(vec.a)) {
vec.a[i] <- vec.a[i] * 2
}
vec.a
## [1] 2 4 6 8
This is far less efficient and not by any means easier to type and we hence tend to avoid loops when possible.
Let us compare the time of execution of the vectorized version (vector with 10,000 elements):
vec <- c(1:1000000)
ptm <- proc.time()
vec <- vec + 1
proc.time() - ptm # vectorized
## user system elapsed
## 0.001 0.003 0.004
–
to the loop version:
vec <- c(1:1000000)
ptm <- proc.time()
for (i in seq_along(vec)) {
vec[i] <- vec[i] + 1
}
proc.time() - ptm # for-loop
## user system elapsed
## 0.09 0.00 0.09
After this exercise you should know:
apply() function to calculate row sums as well as with the built-in rowSums() function. These functions were discussed in the lecture Matrices, data frames, and lists.
X <- matrix(1:1000000, nrow = 100000, ncol = 10)
for.sum <- vector(mode = 'numeric')
# Note that this loop is much faster if you outside the loop create an empty vector of the right size.
# rwmeans <- vector('numeric', 100000)
for (i in 1:nrow(X)) {
for.sum[i] <- sum(X[i,])
}
head(for.sum)
## [1] 4500010 4500020 4500030 4500040 4500050 4500060
app.sum <- apply(X, MARGIN = 1, sum)
head(app.sum)
## [1] 4500010 4500020 4500030 4500040 4500050 4500060
rowSums.sum <- rowSums(X)
head(rowSums.sum)
## [1] 4500010 4500020 4500030 4500040 4500050 4500060
identical(for.sum, app.sum)
## [1] FALSE
identical(for.sum, rowSums.sum)
## [1] TRUE
identical(for.sum, as.integer(rowSums.sum))
## [1] FALSE
x <- 1
while.sum <- vector("numeric", 100000)
while (x <= 100000) {
while.sum[x] <- sum(X[x,])
x <- x + 1
}
head(while.sum)
## [1] 4500010 4500020 4500030 4500040 4500050 4500060
nchar function.
vector1 <- 1:10
vector2 <- c("Odd", "Loop", letters[1:8])
vector3 <- rnorm(10, sd = 10)
dfr1 <- data.frame(vector1, vector2, vector3, stringsAsFactors = FALSE)
sum.vec <- vector()
for(i in 1:ncol(dfr1)) {
if (is.numeric(dfr1[,i])) {
sum.vec[i] <- sum(dfr1[,i])
}
if (is.character(dfr1[,i])) {
sum.vec[i] <- sum(nchar(dfr1[,i]))
}
}
sum.vec
## [1] 55.00000 15.00000 29.66217
sum.vec <- vector()
for(i in 1:ncol(dfr1)) {
if (is.numeric(dfr1[,i])) {
sum.vec[i] <- sum(dfr1[,i])
} else {
sum.vec[i] <- sum(nchar(dfr1[,i]))
}
}
sum.vec
## [1] 55.00000 15.00000 29.66217
dfr.info <- function(dfr) {
sum.vec <- vector()
for (i in 1:ncol(dfr)) {
if (is.numeric(dfr[,i])) {
sum.vec[i] <- sum(dfr[,i])
} else {
sum.vec[i] <- sum(nchar(dfr[,i]))
}
}
sum.vec
}
#Execute the function
dfr.info(dfr1)
## [1] 55.00000 15.00000 29.66217
TRUEs when is logical and the total number of characters if it is a character vector.
nchar function. TRUE values, you can use sum function.
vector1 <- 1:10
vector2 <- c("Odd", "Loop", letters[1:8])
vector3 <- c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE)
dfr2 <- data.frame(vector1, vector2, vector3, stringsAsFactors = FALSE)
sum.vec <- vector()
for(i in 1:ncol(dfr2)) {
if (is.numeric(dfr2[,i])) {
sum.vec[i] <- sum(dfr2[,i])
} else if (is.logical(dfr2[,i])) {
sum.vec[i] <- sum(dfr2[,i])
} else {
sum.vec[i]<-sum(nchar(dfr2[,i]))
}
}
sum.vec
## [1] 55 15 6
stop() function by typing ?stop in the R console.
hours_to_mins <- function(hours) {
if (hours < 0) {
stop("Hours cannot be negative")
}
minutes <- hours * 60
return(minutes)
}
hours_to_mins(3.2)
#Now test it with a negative hour value
hours_to_mins(-3.26)
## Error in hours_to_mins(-3.26): Hours cannot be negative
## [1] 192
Do you want to expand on loops, if-else clauses, and functions? Here a bit more extra material!
If-else clauses operate on logical values. What if we want to take decisions based on non-logical values? Well, if-else will still work by evaluating a number of comparisons, but we can also use switch:
switch.demo <- function(x) {
switch(class(x),
logical = cat('logical\n'),
numeric = cat('Numeric\n'),
factor = cat('Factor\n'),
cat('Undefined\n')
)
}
switch.demo(x=TRUE)
switch.demo(x=15)
switch.demo(x=factor('a'))
switch.demo(data.frame())
## logical
## Numeric
## Factor
## Undefined
What if the authors of, e.g. plot.something wrapper forgot about the ...?
my.plot <- function(x, y) { # Passing downstrem
plot(x, y, las=1, cex.axis=.8, ...)
}
# you can update the function adding the ellipsis like this
formals(my.plot) <- c(formals(my.plot), alist(... = ))
my.plot(1, 1, col='red', pch=19)

Operators like +, - or * are using the so-called infix functions, where the function name is between arguments. We can define our own:
`%p%` <- function(x, y) {
paste(x,y)
}
'a' %p% 'b'
## [1] "a b"
| Function | Purpose | Key Points | Example |
|---|---|---|---|
nrow() |
Number of rows | Works on data frames, matrices | nrow(df) |
ncol() |
Number of columns | Works on data frames, matrices | ncol(df) |
dim() |
Dimensions (rows, cols) | Returns vector c(nrow, ncol) |
dim(df) |
nchar() |
Number of characters in strings | Vectorised over character vectors | nchar(c("abc", "hello")) |
sum() |
Sum of values | na.rm = TRUE to ignore NA |
sum(c(1, 2, NA), na.rm = TRUE) |
rowSums() |
Sum of each row | Works on numeric matrices/data frames | rowSums(df) |
apply() |
Apply function over rows/columns | apply(X, MARGIN, FUN) where MARGIN = 1 (rows) or 2 (cols) |
apply(df, 2, mean) |
sapply() |
Simplifying lapply output | Returns vector/matrix if possible | sapply(df, class) |
lapply() |
Apply function to each element of a list | Returns a list | lapply(df, mean) |
identical() |
Check exact equality | Stricter than ==, checks type + value |
identical(1, 1L) |
is.numeric() |
Check if an object is numeric | Returns TRUE/FALSE | is.numeric(3.14) |
data.frame() |
Create a data frame | stringsAsFactors=FALSE for R < 4.0 |
data.frame(x=1:3, y=c("a","b","c")) |