In programming languages loop structures, either with or without conditions, are used to repeat commands over multiple entities. For and while loops as well as if-else statements are also often used in R, but not as often as in many other programming languages. The reason for this is that many needs of the loops are addressed using vectorization or via apply functions.
This means that we can multiply all values in a vector in R by two by calling
vec.a <- c(1, 2, 3, 4)
vec.a * 2
## [1] 2 4 6 8
In many other and languages as well as in R, you can also create this with a loop instead
for (i in vec.a) {
vec.a[i] <- vec.a[i] * 2
}
vec.a
## [1] 2 4 6 8
As you saw in the lecture, this is far less efficient and not by any means easier to type and we hence tend to avoid loops when possible.
After this exercise you should know:
apply()
function to calculate row sums as well as with the built-in rowSums()
function. These functions were discussed in the lecture Elements of the programming language - part 2.
X <- matrix(1:1000000, nrow = 100000, ncol = 10)
for.sum <- vector()
# Note that this loop is much faster if you outside the loop create an empty vector of the right size.
# rwmeans <- vector('integer', 100000)
for (i in 1:nrow(X)) {
for.sum[i] <- sum(X[i,])
}
head(for.sum)
## [1] 4500010 4500020 4500030 4500040 4500050 4500060
app.sum <- apply(X, MARGIN = 1, sum)
head(app.sum)
## [1] 4500010 4500020 4500030 4500040 4500050 4500060
rowSums.sum <- rowSums(X)
head(rowSums.sum)
## [1] 4500010 4500020 4500030 4500040 4500050 4500060
identical(for.sum, app.sum)
## [1] TRUE
identical(for.sum, rowSums.sum)
## [1] FALSE
identical(for.sum, as.integer(rowSums.sum))
## [1] TRUE
x <- 1
while.sum <- vector("integer", 100000)
while (x < 100000) {
while.sum[x] <- sum(X[x,])
x <- x + 1
}
head(while.sum)
## [1] 4500010 4500020 4500030 4500040 4500050 4500060
nchar
function.
vector1 <- 1:10
vector2 <- c("Odd", "Loop", letters[1:8])
vector3 <- rnorm(10, sd = 10)
dfr1 <- data.frame(vector1, vector2, vector3, stringsAsFactors = FALSE)
sum.vec <- vector()
for(i in 1:ncol(dfr1)) {
if (is.numeric(dfr1[,i])) {
sum.vec[i] <- sum(dfr1[,i])
} else {
sum.vec[i] <- sum(nchar(dfr1[,i]))
}
}
sum.vec
## [1] 55.00000 15.00000 16.02088
dfr.info <- function(dfr) {
sum.vec <- vector()
for (i in 1:ncol(dfr)) {
if (is.numeric(dfr[,i])) {
sum.vec[i] <- mean(dfr[,i])
} else {
sum.vec[i] <- sum(nchar(dfr[,i]))
}
}
sum.vec
}
Read up on the if-else function in R. If possible use the if-else function to answer question 3.
In all loops that we tried out we have created the variable where the output is saved outside the loop. Why is this?