This is the parallelisation lab for RaukR. It will take you through some basic steps to parallelise your code and try to make you think about when and where you can use this tool.

You are highly encouraged to test things out yourself and tweak things to figure out how these methods behave.



1 Install

The first thing we want to do is install the package required for the exercise.

install.packages("future)

2 Exercises

The basic construct for a future looks like this:

a %<-% { expression(s) }

Here is a computationally intensive task that samples numbers from 1:100, 200000000 times.

sample(100,200000000,replace=T)

Evaluating the computation time on my machine, it comes out taking about 5.4 seconds to run.

system.time({sample(100,200000000,replace=T)})
   user  system elapsed 
  5.173   0.160   5.369 

2.1 Sequential and Multi-

  • Use the future package with plan(sequential),which is the default, and run the supplied sample() inside a future.

  • Add an approach from yesterdays lecture on benchmarking or some other way that you are comfortable with to calculate the time it takes to complete the operation of simply assigning the future. Do not evaluate the future yet by asking for the outcome value.

Note: You should not attempt to calculate times taken within the future, always wrap this around futures.

Question 1: Split your sampling into multiple futures and compute the time again. Did it complete faster?

Question 2: Change to plan(multisession) or plan(multicore) according to your setup (operating system type, rstudio or just console). Compute the time again for your multiple futures. Did it complete faster? Think about what the time it takes to compute implies.

Note: I was having some issues with plan(multisession) in Rstudio. If this happens, you might want to just start R console from a terminal window.

Question 3: Ask for the outcome of your futures after their definitions, thus evaluating them. How does this influence the time it takes to perform the operations?

At this stage your code should, in pseudocode, look something like this:

plan(multisession)

timer(
  a %<-% {sample expression}
  b %<-% {sample expression}
  #evaluate futures by requesting outcome values
  a + b
)

Question 4: If you have more than two availableCores(), split the sample() expression to even more futures . Does this influence time to complete in the manner you thought?

2.2 Errors

  • Introduce an error in one of your future expressions.

Question 5: Does the error output immediately?

Question 6: What happens when you try to use that future later in your code?

Question 7: Can you perform other operations between defining your future and evaluating your future?

Further reading about errors and debugging for futures.

2.3 for loops

To use futures in for loops we can use named indices to assign the future to environments. This is pretty similar to assigning values to named indices with the normal assigner <-, the main difference being that we need to use new environments and we can have multiple expressions for futures.

For example:

plan(multisession)

#Create a new environment
v <- new.env()
for (name in c("a", "b", "c")) {
  v[[name]] %<-% {
        #expression(s)
     }
}
#Turn the environment back into a list
v <- as.list(v)
  • Use this to divide the sample() operation into however many smaller pieces you want. Do remember to transform your output back into the object we started with before parallelising the execution.

Now you know the basics of using the future package. With this you have already come a long way in lowering the threshold to implement parallel methods and seeing parallel solutions when you run into it next!

3 Extra credit

Try to apply parallelisation to your own code in a different context than we have done here. For example dividing up a plot or a large dataset. The possibilities are endless.

5 Session info

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin20.4.0 (64-bit)
## Running under: macOS Big Sur 11.4
## 
## Matrix products: default
## LAPACK: /usr/local/Cellar/r/4.1.0/lib/R/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] future_1.21.0     fontawesome_0.2.1 captioner_2.2.3   bookdown_0.22     knitr_1.33       
## 
## loaded via a namespace (and not attached):
##  [1] codetools_0.2-18  listenv_0.8.0     digest_0.6.27     parallelly_1.26.0 magrittr_2.0.1    evaluate_0.14     rlang_0.4.11      stringi_1.6.2    
##  [9] rmarkdown_2.8     tools_4.1.0       stringr_1.4.0     parallel_4.1.0    xfun_0.23         yaml_2.2.1        compiler_4.1.0    globals_0.14.0   
## [17] htmltools_0.5.1.1

Built on: 16-Jun-2021 at 11:53:52.


2021SciLifeLabNBISRaukR website twitter