The objective of this lab is to improve your coding skills by focusing on coding style, code benchmarking and optimization. Below, you will find a number of tasks connected to the topics covered in the Best Coding Practices lecture. Some tasks extend lectures content and require you to find some more information online. Please, note that while we are providing example solutions to many tasks, these are only examples. If you solve a task in a different way it does not matter your solution is wrong. In fact, it may be better than our solution. If in doubt, ask PI for help. We are here for you!

1 Coding Style

1.1 Task: Valid Variable Names.

Which of the following are valid/good variable names in R. What is wrong with the ones that are invalid/bad? var1, 3way_handshake, .password, __test__, my-matrix-M, three.dimensional.array, 3D.distance, .2objects, wz3gei92, next, P, Q, R, S, T, X, is.larger?

1.2 Task: Obscure Code.

The code below works, but can be improved. Do improve it!

myIterAtoR.max <- 5
second_iterator.max<-7
col.NUM= 10
row.cnt =10
fwzy45 <- matrix(rep(1, col.NUM*row.cnt),nrow=row.cnt)
for(haystack in (2-1):col.NUM){
  for(needle in 1:row.cnt) {
if(haystack>=myIterAtoR.max){
fwzy45[haystack, needle]<-NA}
}}

iter_max <- 5
col_num <- 10
row_num <- 10
A <- matrix(rep(1, col_num * row_num), nrow = row_num)
for (i in 1:col_num) {
  for (j in 1:row_num) {
    if (i >= iter_max) {
      A[i, j] <- NA
    }
  }
}

# Can you improve the code more by eliminating loops or at least one of them?

1.3 Task: Better Formatting.

Improve formatting and style of the following code:

simulate_genotype <- function( q, N=100 ) {
  if( length(q)==1 ){
    p <- (1 - q)
    f_gt <- c(p^2, 2*p*q, q^2) # AA, AB, BB
  }else{
    f_gt<-q
  }
  tmp <- sample( c('AA','AB','BB'), size =N, prob=f_gt, replace=T )
  return(tmp)
}

simulate_genotype <- function(q, N = 100) {
  if (length(q) == 1) {
    p <- (1 - q)
    f_gt <- c(p^2, 2*p*q, q^2) # AA, AB, BB
  } else {
    f_gt <- q
  }
  genotype <- sample(c('AA', 'AB', 'BB'), 
                     size = N, 
                     prob = f_gt, 
                     replace = T)
  return(genotype)
}

1.4 Task: Hidden Variable.

Assign a vector of three last months (abbreviated in English) in a year to a hidden variable my_months.

.my_months <- rev(rev(month.abb)[1:3])

1.5 Task: Pipeline-friendly Function.

Modify the function below so that it works with R pipes %>%:

my_filter <- function(threshold = 1, data, scalar = 5) {
  data[data >= threshold] <- NA 
  data <- data * scalar
  return(data)
}

Note you need to have the magrittr or tidyverse package loaded in order to be able to use the pipe %>%!

my_filter <- function(data, threshold = 1, scalar = 5) {
  data[data >= threshold] <- NA 
  data <- data * scalar
  return(data)
}

# Test:
c(-5, 5) %>% my_filter()

1.6 Task: Untidy Code?

Is the code below correct? Can it be improved?

simulate_phenotype <- function(pop_params, gp_map, gtype) {
  pop_mean <- pop_params[1]
  pop_var <- pop_params[2]
  pheno <- rnorm(n = N, mean = pop_mean, sd = sqrt(pop_var))
  effect <- rep(0, times = length(N))
  for (gt_iter in c('AA', 'AB', 'BB')) {
    effect[gtype == gt_iter] <- rnorm(n = sum(gtype == gt_iter), 
                                      mean = gp_map[gt_iter, 'mean_eff'], 
                                      sd = sqrt(gp_map[gt_iter, 'var_eff']))
  }
  dat <- data.frame(gt = gtype, raw_pheno = pheno, effect = effect, pheno = pheno + effect)
  return(dat)
}

Maybe some small improvements can be done, but in principle the code is clean! Except that... the N is not initialized anywhere.

2 Structuring the Code

2.1 Task: Computing Variance.

Write a modular code (function or functions) that computes the sample standard deviation given a vector of numbers. Decide how to logically structure the code. Assume there are no built-in R functions for computing mean and variance available. The formula for variance is: \(SD = \sqrt{\frac{\Sigma_{i=1}^{N}(x_i - \bar{x})^2}{N-1}}\). Standard deviation is \(Var=SD^2\).

Hint: consider that you may want to re-use some computed values in future, e.g. variance.

sample_mean <- function(x) {
  mean <- sum(x) / length(x)
  return(mean)
} 

sum_squared_deviations <- function(x) {
  tmp <- (x - sample_mean(x)) ^ 2
  sum_sq_dev <- sum(tmp)
  return(sum_sq_dev)
}

std_dev <- function(x) {
  variance <- sqrt(sum_squared_deviations(x) / (length(x) - 1))
  return(variance)
}

variance <- function(x) {
  return(std_dev(x) ^ 2)
}

2.2 Task: Writing a Wrapper Function.

You found two functions in two different packages: the randomSampleInt function that generates a random sample of integer numbers and the randomSampleLetter function for generating a random sample of letters. Unfortunately, the functions are called in different ways which you want to unify in order to use them interchangeably in your code. Write a wrapper function around the randomSampleLetter that will provide the same interface to the function as the randomSampleInt. Also, the randomSampleLetter cannot handle the seed. Can you add this feature to your wrapper?

randomSampleInt <- function(x, verbose, length, seed = 42) {
  if (verbose) {
    print(paste0('Generating random sample of ', length, ' integers using seed ', seed))
  }
  set.seed(seed)
  sampleInt <- sample(x = x, size = length, replace = TRUE)
  return(sampleInt)
} 

randomSampleLetter <- function(N, silent=T, lett) {
  if (!silent) {
    print(paste0('Generating random sample of ', N, ' letters.'))
  }
  sample <- sample(x = lett, size = N, replace = TRUE)
  return(sample)
}

randomSampleLetterWrapper <- function(x, verbose, length, seed = 42) {
  set.seed(seed)
  result <- randomSampleLetter(N = length, silent = !verbose, lett = x)
  return(result)
}

2.3 Task: Customizing `plot`.

Write a wrapper around the graphics::plot function that modifies its default behaviour so that it plots red crosses instead of black points. Do it in a way that enables the user to modify other function arguments. Hint: you may want to have a look at graphics::plot.default.

my_plot <- function(x, ...) {
  plot(x, pch = 3, color = 'red', ...) 
}

2.4 Task: Adding Arguments to a Function.

What if you want to pass some additional parameters to a function and, sadly, the authors forgot to add ... to the list of function arguments. There is a way out – you can bind extra arguments supplied as alist structure to the original function arguments retrieved by formals. Try to fix the function below, so that the call red_plot(1, 1, col='red', pch=19) will result in points being represented by red circles. Do use alist and formals and do not edit the red_plot itself! Hint: read help for alist and formals. Original function:

red_plot <- function(x, y) { 
  plot(x, y, las=1, cex.axis=.8, ...)
}

red_plot <- function(x, y) { 
  plot(x, y, las=1, cex.axis=.8, ...)
}

red_plot(1, 1, col='red', pch=19) # Does not work.
formals(red_plot) <- c(formals(red_plot), alist(... = )) # Fix.
red_plot(1, 1, col='red', pch=19) # Works!

2.5 Task: Using `options`.

Use options to change the default prompt in R to hello :-) >. Check what options are stored in the hidden variable called Options.

options(prompt = "hello :-) > ")
.Options
options(prompt = "> ") # restoring the default

3 Session Info

This document has been created in RStudio using R Markdown and related packages.
For R Markdown, see http://rmarkdown.rstudio.com
For details about the OS, packages and versions, see detailed information below:

sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bsplus_0.1.2      fontawesome_0.2.1 captioner_2.2.3   bookdown_0.22    
## [5] knitr_1.33       
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.6        lubridate_1.7.10  digest_0.6.27     magrittr_2.0.1   
##  [5] evaluate_0.14     rlang_0.4.11      stringi_1.6.2     generics_0.1.0   
##  [9] rmarkdown_2.8     tools_4.0.3       stringr_1.4.0     xfun_0.23        
## [13] yaml_2.2.1        compiler_4.0.3    htmltools_0.5.1.1

Page built on: 11-Jun-2021 at 14:54:03.

2018 | SciLifeLab > NBIS > RaukR

Best Coding Practices — tidy, neat and efficient code

RaukR 2021 • Advanced R for Bioinformatics

Marcin Kierczak

1 Coding Style

1.1 Task: Valid Variable Names.

1.2 Task: Obscure Code.

1.3 Task: Better Formatting.

1.4 Task: Hidden Variable.

1.5 Task: Pipeline-friendly Function.

1.6 Task: Untidy Code?

2 Structuring the Code

2.1 Task: Computing Variance.

2.2 Task: Writing a Wrapper Function.

2.3 Task: Customizing `plot`.

2.4 Task: Adding Arguments to a Function.

2.5 Task: Using `options`.

3 Session Info

Best Coding Practices — tidy, neat and efficient code

RaukR 2021 • Advanced R for Bioinformatics

Marcin Kierczak

1 Coding Style

1.1 Task: Valid Variable Names.

1.2 Task: Obscure Code.

1.3 Task: Better Formatting.

1.4 Task: Hidden Variable.

1.5 Task: Pipeline-friendly Function.

1.6 Task: Untidy Code?

2 Structuring the Code

2.1 Task: Computing Variance.

2.2 Task: Writing a Wrapper Function.

2.3 Task: Customizing plot.

2.4 Task: Adding Arguments to a Function.

2.5 Task: Using options.

3 Session Info

2.3 Task: Customizing `plot`.

2.5 Task: Using `options`.