<- sample(1:1000, size = 1000, replace = T)
input <- NULL
currmin for (i in input) {
if (input > currmin) {
<- input
currmin print(paste0("The new minimum is: ", currmin))
} }
The objective of this lab is to improve your coding skills by focusing on code debugging, benchmarking and optimization. Below, you will find a number of tasks connected to the topics covered in the Debugging, profiling and optimization lecture. Some tasks extend lectures content and require you to find some more information online. Please, note that while we are providing example solutions to many tasks, these are only examples. If you solve a task in a different way it does not matter your solution is wrong. In fact, it may be better than our solution. If in doubt, ask TA for help. We are here for you!
1 Debugging
1.1 Task: Code Correctness
Which of the following chunks of code are correct and which contain errors? Identify these errors.
1.1.1 Chunk 1
1.1.2 Chunk 2
<- sample(1:1000, size = 1000, replac = T)
input <- NULL
currmin for (i in input) {
if (input < currmin) {
<- input
currmin print(paste0("The new minimum is: ", currmin))
} }
1.1.3 Chunk 3
for (cnt in 1:100) {
if (cnt > 12) {
print("12+")
else {
} print("Not 12+")
} }
1.1.4 Chunk 4
<- logical(10)
result <- sample(1:10, size = 10, replace = T)
input for (i in 0:length(input)) {
if (input[i] >= 5) {
<- TRUE
result[i]
} }
1.2 Task: Debugger.
Play with debugger as described in lecture slides.
1.3 Task: Floating-point Arithmetics.
Can you fix the code below so that it produces more reliable result?
Think in terms of system-specific representation \(\epsilon\).
Put the value of your double \(\epsilon\) into this spreadsheet (Best Coding Practises Lab sheet).
<- seq(0.1, 0.9, by=0.1)
vec == 0.7 vec
# One way is to use epsilon
# Check machine's floating point representation
<- seq(0.1, 0.9, by=0.1)
vec
# Make a custom function that uses machines' epsilon for comparing
# values
<- function(x, y) {
is_equal <- F
isEqual if (abs(x - y) < unlist(.Machine)['double.eps']) {
<- T
isEqual
}
isEqual
}
# Some tests
0.7 == 0.6 + 0.1
is_equal(0.7, 0.6 + 0.1)
0.7 == 0.8 - 0.1
is_equal(0.7, 0.8 - 0.1)
# Now you can use the is_equal to fix the code!
2 Profiling
2.1 Task: Filling A Large Matrix.
Create a 10 000 x 10 000 matrix and fill it with random numbers (from 1 to 42), first row by row and later column by column. Use proc.time
to see if there is any difference. Is the measurement reliable? Record the values you got in this spreadsheet:
<- 10e3 * 10e3
N
# By row
<- proc.time()
t1 <- matrix(sample(1:42, size = N, replace = T), nrow = sqrt(N), byrow = T)
M <- proc.time()
t2 - t1)
(t2
# By column
<- proc.time()
t1 <- matrix(sample(1:42, size = N, replace = T), nrow = sqrt(N), byrow = F)
M <- proc.time()
t2 - t1) (t2
2.2 Task: Timing Reliability.
In the lecture slides, you have seen how to time sampling from Normal Gaussian distribution:
system.time(rnorm(n = 10e6))
Is such single measurement reliable? Run the code 100 times, plot and record the mean and the variance of the elapsed
time. Put these values (elapsed.time mean and variance) into this spreadsheet (Best Coding Practises Lab sheet).
<- double(100)
timing for (i in 1:100) {
<- system.time(rnorm(n = 10e6))
st <- st[3]
timing[i]
}boxplot(timing)
mean(timing)
var(timing)
Optional An alternative approach or, more exactly, an alternative notation that achieves the same as the previous chunk of code but in a more compact way makes use of the replicate
, a wrapper function around sapply
that simplifies repeated evaluation of expressions. The drawback is you do not get the vector of the actual timing values but the results of calling system.time
are already averaged for you. Try to read about the replicate
and use it to re-write the code above. Put the elapsed.time
into this spreadsheet (Best Coding Practises Lab sheet). How does this value compare to calling system.time
within a loop in the previous chunk of code? Are the values similar?
<- system.time(replicate(n = 100, rnorm(n = 10e6))) st2
2.3 Task: Microbenchmarking.
While system.time
might be sufficient most of the time, there is also a package microbenchmark
that enables more accurate time profiling, aiming at microsecond resolution that most of modern operating systems offer. Most of the benchmarking the microbenchmark
does is implemented in low-overhead C functions and also the package makes sure to:
- estimate granularity and resolution of timing for your particular OS,
- warm up your processor before measuring, i.e. wake the processor up from any idle state or likewise.
Begin by installing the microbenchmark
package.
NOTE! We have noticed that for, e.g. M1 and M2 architectures on newer MacBooks it does not work well!
2.3.1 Checking System Time.
Check the current value of the platform’s timer.
::get_nanotime() microbenchmark
Modify the code below so that it uses the current value of platform’s timer:
<- double(100)
timing for (i in 1:100) {
<- system.time(rnorm(n = 10e6))
st <- st[3]
timing[i]
}boxplot(timing)
Put the mean and the variance into this spreadsheet (Best Coding Practises Lab sheet, Microbenchmark – loop)
library(microbenchmark)
<- double(100)
timing for (i in 1:100) {
<- get_nanotime()
nanotime_start rnorm(n = 10e6)
<- get_nanotime()
nanotime_stop <- nanotime_stop - nanotime_start
timing[i]
}mean(timing)
var(timing)
boxplot(timing)
2.3.2 Microtiming Precision.
There is an experimental function in the microbenchmark
package that helps the package estimate granularity and resolution of your particular timing subsystem. According to the documentation, the function measures the overhead of timing a C function call rounds times and returns all non-zero timings observed.
Run the microtiming_precision
function and put the mean and the variance of the resulting vector into this spreadsheet (Best Coding Practises Lab sheet, Microbenchmark – precision)
<- microbenchmark::microtiming_precision()
precision mean(precision)
var(precision)
Run the function one time without assigning its value to a variable and consult the documentation. Compare the output of running the function without assigning the value to a variable, the values stored in the variable by the function upon assignment and the value specified in the documentation.
# In version 1.4-4 of the package, all three ways give different results!
::microtiming_precision()
microbenchmark<- microbenchmark::microtiming_precision()
precision ::microtiming_precision ?microbenchmark
2.3.3 The Microbenchmark Way.
Finally, let’s benchmark our rnorm
example using microbenchmark
:
- microbenchmark the
rnorm(n = 10e6)
expression, - plot the results using both
ggplot2
and a boxplot (read themicrobenchmark
package documentation), - look at the summary of the benchmark,
- how long does it take to dispatch a simple function that does nothing compared to evaluating a constant and adding two integers?
require(microbenchmark)
require(ggplot2)
<- microbenchmark(rnorm(n = 10e6))
mb autoplot(mb)
boxplot(mb)
summary(mb)
<- function() {}
f <- microbenchmark(f(), pi, 2+2)
mb2 summary(mb2)
autoplot(mb2)
2.4 Optional Task: More Advanced Profiling.
Now, we will use a even more sophisticated approach to profiling.
2.4.1 The Rprof
way.
Write three functions that fill by row a \(N \times N\) matrix \(M\) with randomly generated numbers from a vector given as argument
bag
, allow for passing random seed value as function argument with the default value of 42. After filling the matrix with values, add to each and every element of \(M\) the number of column the element is in and return such matrix from the function. Functions should:fill_alloc
) – use memory allocation prior to loop in which the matrix is being filled and allocate memory usinginit
value passed as argument and by default set toNULL
,fill_noalloc
– not use memory allocation prior to the loop,fill_noloop
should not the loop for filling the matrix in.
Do not perform addition of column number in the same loop.
Following this and using rnorm(1000, mean = 0, sd = 1)
:
- use
Rprof
to profile the functions using the same seed and N=100, - use
Rprof
to check whether there is a difference between initializing the matrix usingNULL
and 0 infill_alloc
, - what happens if \(N = 10\) compared to \(N = 20\) to \(N = 100\)?
<- function(N, bag, seed = 42) {
fill_noloop set.seed(seed)
<- sample(bag, size = N^2, replace = T)
values <- matrix(data = values, nrow = N, byrow = T)
M for (col_num in 1:N) {
<- M[, col_num] + col_num
M[, col_num]
}return(M)
}
<- function(N, bag, seed = 42) {
fill_noalloc set.seed(seed)
<- sample(bag, size = N^2, replace = T)
values <- NULL
M = 1
cnt for (row in 1:N) {
<- c()
row_tmp for (col in 1:N) {
<- c(row_tmp, values[cnt])
row_tmp <- cnt + 1
cnt
}<- rbind(M, row_tmp)
M
}for (col_num in 1:N) {
<- M[, col_num] + col_num
M[, col_num]
}return(M)
}
<- function(N, bag, seed = 42, init = NA) {
fill_alloc set.seed(seed)
<- sample(bag, size = N^2, replace = T)
values <- matrix(rep(init, times=N^2), nrow = N, byrow = T)
M = 1
cnt for (row in 1:N) {
for (col in 1:N) {
<- values[cnt]
M[row, col] <- cnt + 1
cnt
}
}for (col_num in 1:N) {
<- M[, col_num] + col_num
M[, col_num]
}return(M)
}
# Example of profiling one function: fill_noloop
Rprof(memory.profiling = T)
fill_noloop(10000, c(9:0), seed = 100)
Rprof(NULL)
<- summaryRprof('Rprof.out', memory='both')
summary $by.self
summary
# answers to the remaining questions are not given here
2.5 Optimization
Have a look at our answers from the previous task .
- How can you optimize the
fill_alloc
even further (call the optimized versionfill_alloc_opt
)?
<- function(N, bag, seed = 42, init = NA) {
fill_alloc_opt set.seed(seed)
<- sample(bag, size = N^2, replace = T)
values <- matrix(rep(init, times=N^2), nrow = N, byrow = T)
M = 1
cnt for (row in 1:N) {
for (col in 1:N) {
<- values[cnt] + col
M[row, col] <- cnt + 1
cnt
}
}return(M)
}
- Optimize the
fill_noloop
tofill_noloops
that does not use any loops at all.
<- function(N, bag, seed = 42) {
fill_noloops <- sample(bag, size = N^2, replace = T)
values <- rep(x = 1:N, times = N)
inc <- matrix(data = values + inc, nrow = N, byrow = T)
M return(M)
}
2.6 Optiona Task: Using the profr
package.
- Install and load the
profr
package. - Use
profr
to profilefill_noloop
,fill_noloops
andfill_alloc_opt
.
Please use the following function, if you were not able to run profr::ggplot.profr
.
<- function(data, ..., minlabel = 0.1, angle=0) {
ggplot.profr if (!requireNamespace("ggplot2", quietly = TRUE))
stop("Please install ggplot2 to use this plotting method")
$range <- diff(range(data$time))
data
# quiet R CMD check note
<- NULL
start <- NULL
end <- NULL
time
::ggplot(as.data.frame(data)) +
ggplot2::geom_rect(
ggplot2::aes(xmin = start, xmax = end, ymin = level - 0.5, ymax = level + 0.5),
ggplot2fill = "grey95", colour = "black", size = 0.5) +
::geom_text(
ggplot2::aes(start + range / 60, level, label = f),
ggplot2data = subset(data, time > max(time) * minlabel),
size = 4, angle = angle, hjust = 0) +
::scale_y_continuous("time") +
ggplot2::scale_x_continuous("level")
ggplot2 }
library(profr)
Rprof("profr_noloop.out", interval = 0.01)
fill_noloop(1000, rnorm(1000), seed = 42)
Rprof(NULL)
<- parse_rprof('profr_noloop.out')
profile_noloop_df
Rprof("profr_noloops.out", interval = 0.01)
fill_noloops(100, rnorm(1000), seed = 42)
Rprof(NULL)
<- parse_rprof('profr_noloops.out')
profile_noloops_df
Rprof("profr_alloc_opt.out", interval = 0.01)
fill_alloc_opt(10, rnorm(1000), seed = 42)
Rprof(NULL)
<- parse_rprof('profr_alloc_opt.out')
profile_alloc_opt_df
::ggplot.profr(profile_noloop_df)
profr::ggplot.profr(profile_noloops_df)
profr::ggplot.profr(profile_alloc_opt_df) profr
2.7 Optional Task: Using the profvis
package.
- Install and load the
profvis
package. - Use
profvis
to profilefill_noloop
, andfill_alloc
functions.
3 Optimize Your Code
In this section, we will deal with some selected ways to optimize your code.
3.1 Task: Fix and Optimize This!
Given is a function:
<- function(N = 1000, values = c(1:1e4)) {
optimize_me = 10; values = c(1:1e4)
N <- matrix(size = N^2)
dat1 for (i in 1:N) {
for (j in 1:N) {
<- sample(values, 1)
dat1[i, j]
}
}<- dat1
dat0 lower.tri(dat1)] <- t(dat1)[lower.tri(dat1)]
dat1[
<- NULL
dat2 for (i in 1:N) {
<- c()
i_tmp for (j in 1:N) {
<- c(i_tmp, sample(values, 1))
i_tmp
}<- rbind(dat2, i_tmp)
dat2
}lower.tri(dat2)] <- t(dat2)[lower.tri(dat2)]
dat2[
<- dat2
M for (i in 1:N) {
for (j in 1:N) {
<- dat1[i, j] * dat2[i, j]
M[i, j]
}
}for (i in 1:N) {
for (j in 1:N) {
<- M[i, j] + values[3]
M[i, j]
}
}<- M %*% dat0
N <- apply(N, 2, mean)
result return(result)
}
- What does it do, step-by-step?
- Profile it.
- Is
dat1 <- matrix(size = N^2)
better thandat1 <- matrix(NA, nrow=N, ncol=N)
? - Can you optimize something using
BLAS
? - Can you optimize by using
apply
somewhere? - Can you optimize
apply
further? - What else can you optimize. Do it. Report speed gain and memory gain compared to the original version in this spreadsheet (Best Coding Practises Lab sheet, Optimization gains).