Best Coding Practices

RaukR 2024 β€’ Advanced R for Bioinformatics

Marcin Kierczak, Sebastian DiLorenzo

21-Jun-2024

Learning Outcomes


After this module:

  • You will be aware of different coding styles.
  • You will know what styles are good 🦸 and bad 🦹 and why.
  • You will be reminded about dividing a problem before your start conquering(coding).
  • You will understand when it is time to write a function.
  • You will have a basic understanding of using github copilot with RStudio πŸ€–.
  • Your code will reach new level of awesomeness! πŸ†’.

Todays topics


  • Style β€” _howTo_style.yourCode?

  • Structure β€” how to think πŸ€” about the code and manufacture your own building 🚧 blocks

  • Documentation β€” how to use quarto for reproducibility, convenience and code integration πŸ“–
  • Debugging β€” my code does not run 😞

  • Profiling β€” now it does run but… out of memory πŸ’£

  • Optimization β€” making things better πŸ‘·

What is Coding Style?

  • Naming conventions β€” assigning names to variables

  • Code formatting β€” placement of braces, use of white space characters etc.

From: Behind The Lines 2010-09-23. By Oliver Widder, Webcomics Geek And Poke.

Naming Conventions

A syntactically valid name:

  • Consists of:

    • letters: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
    • digits: 0123456789
    • period: .
    • underscore: _
  • Begins with a letter or the period (.), if . should not followed by a number

  • Cannot be one of the reserved words: if, else, repeat, while, function, for, in, next, break, TRUE, FALSE, NULL, Inf, NaN, NA, NA_integer_, NA_real_, NA_complex_, NA_character_

  • Also cannot be: c, q, t, C, D, I as they are reserved function names.

Naming Style

Variable names that are legal are not necessarily a good style and they may be dangerous πŸ’€:

F
T
[1] FALSE
[1] TRUE
F + T  
[1] 1
F <- 3  
F + T  
[1] 4

do not do this!

unless you are a politician πŸ•΄β€¦

Avoid T and F as variable names.

Customary Variable Names

Also, there is a number of variable names that are traditionally used to name particular variables:

  • usr β€” user
  • pwd β€” password
  • x, y, z β€” vectors
  • w β€” weights
  • f, g β€” functions
  • n β€” number of rows
  • p β€” number of columns
  • i, j, k β€” indexes
  • df β€” data frame
  • cnt β€” counter
  • M, N, W β€” matrices
  • tmp β€” temporary variables

Sometimes these are domain-specific:

  • p, q β€” allele frequencies in genetics,
  • N, k β€” number of trials and number of successes in stats

Try to avoid using these in this way to avoid possible confusion.

Different Notations

People use different notation styles throughout their code:

  • snake_notation_looks_like_this
  • camelNotationLooksLikeThis
  • period.notation.looks.like.this

But many also use…

  • LousyNotation_looks.likeThis

Try to be consistent and stick to one of them. Bear in mind period.notation is used by S3 classes to create generic functions, e.g. plot.my.object. A good-enough reason to avoid it?

It is also important to maintain code readability by having your variable names:

  • informative, e.g. genotypes vs. fsjht45jkhsdf4
  • Consistent across your code β€” the same naming convention
  • Not too long, e.g. weight vs. phenotype.weight.measured

Special Variable Names

  • There are built-in variable names:

    • LETTERS: the 26 upper-case letters of the Roman alphabet
    • letters: the 26 lower-case letters of the Roman alphabet
    • month.abb: the three-letter abbreviations for the English month names
    • month.name: the English names for the months of the year
    • pi: the ratio of the circumference of a circle to its diameter
  • Variable names beginning with period are hidden: .my_secret_variable πŸ‘» will not be shown but can be accessed

.the_hidden_answer <- 42
ls()
[1] "F" "T"

but with a bit of effort you can see them:

ls(all.names = TRUE)
[1] ".main"               ".QuartoInlineRender" ".Random.seed"       
[4] ".the_hidden_answer"  "F"                   "T"                  

Structure Your Code

Decompose the problem 🧩 🧩!


source: Wikimedia Commons

  • divide et impera / top-down approach β€” split your BIG problem into a number of small sub-problems recursively and, at some level, encapsulate your code in functional blocks (functions)
  • a function should be performing a small task, it should be a logical program unit

when should I write a function ❓

  • one screen πŸ’» rule (resolution…),
  • re-use twice rule of πŸ‘.

How to write functions

  • avoid accessing and modifying globals
    • avoid 🚫 a <<- 42
    • and πŸ†— use a closure instead
new_counter <- function() {
  i <- 0
  function() {
    # do something useful, then ...
    i <<- i + 1
    i
  }
}

counter1 <- new_counter(); counter2 <- new_counter()
counter1(); counter1(); counter2()
[1] 1
[1] 2
[1] 1

Source: Stackoverflow

How to write functions

  • use data as the very first argument for %>% pipes sake:
    • myfun <- function(x, arg) πŸ†—
    • myfun <- function(arg, x) πŸ™…
  • set arguments to defaults β€” better too many args than too few:
    • myfun <- function(x, seed = 42) πŸ†—
    • myfun <- function(x, ...) 🚯
  • remember that global defaults can be changed by options

Wrapper function

If you are re-using functions written by someone else β€” write a wrapper function around them

my_awesome_plot <- function(x, ...) {
  plot(x, col='red', pch=19, cex.axis=.7, ...)
}
my_awesome_plot(1:5, col = 'blue')
Error in localWindow(xlim, ylim, log, asp, ...): formal argument "col" matched by multiple actual arguments
my_awesome_plot(1:5, las = 1)

How to write functions

  • showing progress and messages is good, but let the others turn this functionality off
  • if you are calling other functions, consider using ...




source: http://www.xkcd/com/292

Github copilot ❀️ RStudio



source: https://github.com/edu/students

Thank you! Questions?

         _                     
platform aarch64-apple-darwin20
os       darwin20              
major    4                     
minor    4.0                   

2024 β€’ SciLifeLab β€’ NBIS β€’ RaukR