Best Coding Practises

RaukR 2023 β€’ Advanced R for Bioinformatics

Marcin Kierczak

27-Jun-2023

Learning Outcomes


After this module:

  • You will be aware of different coding styles.
  • You will know what styles are good 🦸 and bad 🦹 and why ❓.
  • You will know how to decompose a problem before you even start coding.
  • You will understand when there is time for writing a function.
  • Your code will reach new level of awesomeness! πŸ†’.

Topics of This Block


  • Style β€” _howTo_style.yourCode?

  • Structure β€” how to think πŸ€” about the code and manufacture your own building 🚧 blocks

  • Debugging β€” my code does not run 😞

  • Profiling β€” now it does run but… out of memory πŸ’£

  • Optimization β€” making things better πŸ‘·β€β™‚οΈ

  • Vectorization β€” more details on optimization via vectorization ↕️

  • Parallelization β€” run things in parallel, rule them all! πŸ’

What is Coding Style?

  • Naming conventions β€” assigning names to variables

  • Code formatting β€” placement of braces, use of white space characters etc.

From: Behind The Lines 2010-09-23. By Oliver Widder, Webcomics Geek And Poke.

Naming Conventions

A syntactically valid name:

  • Consists of:

    • letters: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
    • digits: 0123456789
    • period: .
    • underscore: _
  • Begins with a letter or the period (.) not followed by a number

  • Cannot be one of the reserved words: if, else, repeat, while, function, for, in, next, break, TRUE, FALSE, NULL, Inf, NaN, NA, NA_integer_, NA_real_, NA_complex_, NA_character_

  • Also cannot be: c, q, t, C, D, I as they are reserved function names.

Naming Style

Variable names that are legal are not necessarily a good style and they may be dangerous πŸ’€:

F
T
[1] FALSE
[1] TRUE
F + T  
[1] 1
F <- 3  
F + T  
[1] 4

do not do this!

unless you are a politician πŸ•΄β€¦

Avoid T and F as variable names.

Customary Variable Names

Also, there is a number of variable names that are traditionally used to name particular variables:

  • usr β€” user
  • pwd β€” password
  • x, y, z β€” vectors
  • w β€” weights
  • f, g β€” functions
  • n β€” number of rows
  • p β€” number of columns
  • i, j, k β€” indexes
  • df β€” data frame
  • cnt β€” counter
  • M, N, W β€” matrices
  • tmp β€” temporary variables

Sometimes these are domain-specific:

  • p, q β€” allele frequencies in genetics,
  • N, k β€” number of trials and number of successes in stats

Try to avoid using these in this way to avoid possible confusion.

Different Notations

People use different notation styles throughout their code:

  • snake_notation_looks_like_this
  • camelNotationLooksLikeThis
  • period.notation.looks.like.this

But many also use…

  • LousyNotation_looks.likeThis

Try to be consistent and stick to one of them. Bear in mind period.notation is used by S3 classes to create generic functions, e.g. plot.my.object. A good-enough reason to avoid it?

It is also important to maintain code readability by having your variable names:

  • informative, e.g. genotypes vs. fsjht45jkhsdf4
  • Consistent across your code β€” the same naming convention
  • Not too long, e.g. weight vs. phenotype.weight.measured

Special Variable Names

  • There are built-in variable names:

    • LETTERS: the 26 upper-case letters of the Roman alphabet
    • letters: the 26 lower-case letters of the Roman alphabet
    • month.abb: the three-letter abbreviations for the English month names
    • month.name: the English names for the months of the year
    • pi: the ratio of the circumference of a circle to its diameter
  • Variable names beginning with period are hidden: .my_secret_variable πŸ‘» will not be shown but can be accessed

.the_hidden_answer <- 42
ls()
[1] "F"               "has_annotations" "T"              

but with a bit of effort you can see them:

ls(all.names = TRUE)
[1] ".First"             ".Last"              ".main"             
[4] ".Random.seed"       ".the_hidden_answer" "F"                 
[7] "has_annotations"    "T"                 

Structure Your Code

Decompose the problem 🧩 🧩!


source: Wikimedia Commons

  • divide et impera / top-down approach β€” split your BIG problem into a number of small sub-problems recursively and, at some level, encapsulate your code in functional blocks (functions)
  • a function should be performing a small task, it should be a logical program unit

when should I write a function ❓

  • one screen πŸ’» rule (resolution…),
  • re-use twice rule of πŸ‘.

consider creating an S4 or even an R6 class β€” data-type safety!

How to write functions

  • avoid accessing and modifying globals
    • avoid β›” a <<- 42
    • and πŸ†— use a closure instead
new_counter <- function() {
  i <- 0
  function() {
    # do something useful, then ...
    i <<- i + 1
    i
  }
}

counter1 <- new_counter(); counter2 <- new_counter()
counter1(); counter1(); counter2()
[1] 1
[1] 2
[1] 1

based on Stackoverflow answer

How to write functions

  • use data as the very first argument for %>% pipes sake:
    • myfun <- function(x, arg) πŸ†—
    • myfun <- function(arg, x) β›”
  • set arguments to defaults β€” better too many args than too few:
    • myfun <- function(x, seed = 42) πŸ†—
    • myfun <- function(x, ...) πŸ™…β€β™‚οΈ
  • remember that global defaults can be changed by options

Wrapper function

If you are re-using functions written by someone else β€” write a wrapper function around them

my_awesome_plot <- function(x, ...) {
  plot(x, col='red', pch=19, cex.axis=.7, ...)
}
my_awesome_plot(1:5, col = 'blue')
Error in localWindow(xlim, ylim, log, asp, ...): formal argument "col" matched by multiple actual arguments
my_awesome_plot(1:5, las = 1)

How to write functions

  • showing progress and messages is good, but let the others turn this functionality off
  • if you are calling other functions, consider using ...




source: http://www.xkcd/com/292

Thank you! Questions?

         _                  
platform x86_64-pc-linux-gnu
os       linux-gnu          
major    4                  
minor    2.3                

2023 β€’ SciLifeLab β€’ NBIS β€’ RaukR