class: center, middle, inverse, title-slide .title[ # Best Coding Practises ] .subtitle[ ## RaukR 2022 âą Advanced R for Bioinformatics ] .author[ ###
Marcin Kierczak
] .institute[ ### NBIS, SciLifeLab ] --- exclude: true count: false <link href="https://fonts.googleapis.com/css?family=Roboto|Source+Sans+Pro:300,400,600|Ubuntu+Mono&subset=latin-ext" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous"> <!-- ----------------- Only edit title & author above this ----------------- --> --- name: empty ## --- name: learning-outcomes ## Learning Outcomes <br><br> After this module: * you will be aware of different coding styles -- * you will know what styles are good 𩾠and bad đŠč and why â -- * you will know how to decompose a problem before you even start coding -- * you will understand when there is time for writing a function -- * your code will reach new level of awesomeness! đ --- name: overview ## Topics of This Presentation <br><br> Today with Marcin, Mun-Gwan and Ash: <br> * **style** — __howTo_style.yourCode? * **structure** — how to think đ€ about the code and manufacture your own building đ§ blocks -- Tomorrow with Ash and Marcin: * **debugging** — my code does not run đż * **profiling** — now it does run but... out of memory đŁ * **optimization** — making things better đ·ââïž -- On Wednesday with Sebastian and Marcin: * **vectorization** — more details on optimization via vectorization âŹïž * **parallelization** — run things in parallel, rule them all! đ --- name: coding-style ## What is Coding Style? * naming conventions — assigning names to variables * code formatting — placement of braces, use of white space characters etc. .center[ <img src="./assets/coding_style.jpg" class="fancyimage", style="width:49%; height:49%; box-shadow:0px 0px 0px white"><br> .vsmall[From: [Behind The Lines](http://geekandpoke.typepad.com/geekandpoke/2010/09/behind-the-lines.html) 2010-09-23. By Oliver Widder, Webcomics Geek And Poke.] ] --- name: naming-conventions ## Naming Conventions A syntactically valid name: * consists of: + letters: `abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ` + digits: `0123456789` + period: `.` + underscore: `_` * begins with a letter or the period (`.`) **not** followed by a number * cannot be one of the *reserved words*: `if`, `else`, `repeat`, `while`, `function`, `for`, `in`, `next`, `break`, `TRUE`, `FALSE`, `NULL`, `Inf`, `NaN`, `NA`, `NA_integer_`, `NA_real_`, `NA_complex_`, `NA_character_` * also cannot be: `c`, `q`, `t`, `C`, `D`, `I` as they are reserved function names. --- name: naming-styles ## Naming Style Variable names that are legal are not necessarily a good style and they may be dangerous đ: ```r F T ``` ``` ## [1] FALSE ## [1] TRUE ``` ```r F + T ``` ``` ## [1] 1 ``` ```r F <- 3 F + T ``` ``` ## [1] 4 ``` do not do this! -- unless you are a politician đŽ... <br><br><br> .center[.large[Avoid `T` and `F` as variable names.]] --- ## Customary Variable Names Also, there is a number of variable names that are traditionally used to name particular variables: * `usr` — user * `pwd` — password * `x`, `y`, `z` — vectors * `w` — weights * `f`, `g` — functions * `n` — number of rows * `p` — number of columns * `i`, `j`, `k` — indexes * `df` — data frame * `cnt` — counter * `M`, `N`, `W` — matrices * `tmp` — temporary variables Sometimes these are domain-specific: * `p`, `q` — allele frequencies in genetics, * `N`, `k` — number of trials and number of successes in stats <br><br> .center[.large[Try to avoid use these in this way to avoid possible confusion.]] --- ## Different Notations People use different notation styles throughout their code: -- * `snake_notation_looks_like_this` -- * `camelNotationLooksLikeThis` -- * `period.notation.looks.like.this` -- but many also use... -- * `LousyNotation_looks.likeThis` -- Try to be consistent and stick to one of them. Bear in mind `period.notation` is used by S3 classes to create generic functions, e.g. `plot.my.object`. A good-enough reason to avoid it? .center[***] -- It is also important to maintain code readability by having your variable names: * informative, e.g. `genotypes` vs. `fsjht45jkhsdf4` -- * consistent across your code — the same naming convention -- * not too long, e.g. `weight` vs. `phenotype.weight.measured` -- * in the period notation and the snake notation avoid `my.var.2` or `my_var_2`, use `my.var2` and `my_var2` instead --- ## Special Variable Names Few more things to consider: * There are built-in variable names: + `LETTERS`: the 26 upper-case letters of the Roman alphabet + `letters`: the 26 lower-case letters of the Roman alphabet + `month.abb`: the three-letter abbreviations for the English month names + `month.name`: the English names for the months of the year + `pi`: the ratio of the circumference of a circle to its diameter * Variable names beginning with period are **hidden**: `.my_secret_variable` đ» will not be shown but can be accessed ```r .the_hidden_answer <- 42 ls() ``` ``` ## [1] "F" "fa" "T" ``` but with a bit of effort you can see them: ```r ls(all.names = TRUE) ``` ``` ## [1] ".Random.seed" ".the_hidden_answer" "F" ## [4] "fa" "T" ``` --- name: structuring_your_code ## Structure Your Code Decompose the problem 𧩠đ§©! .center[ <img src="./assets/Philip-ii-of-macedon.jpg" class="fancyimage", style="height:200px; box-shadow:0px 0px 0px white"> <img src="./assets/Julius_Ceasar.jpg" class="fancyimage", style="height:200px; box-shadow:0px 0px 0px white"> <img src="./assets/Napoleon_Bonaparte.jpg" class="fancyimage", style="height:200px; box-shadow:0px 0px 0px white"><br> .vsmall[source: Wikimedia Commons] ] -- * *divide et impera* / top-down approach — split your BIG problem into a number of small sub-problems recursively and, **at some level**, encapsulate your code in functional blocks (functions) * a function should be performing a small task, it should be a logical program unit **when should I write a function â** * one screen đ» rule (resolution...), * re-use twice rule of đ. consider creating an S4 or even an R6 class — data-type safety! --- name: how_to_write_functions ## How to write functions * avoid accessing and modifying globals + avoid đ `a <<- 42` + and đ use a closure instead -- ```r new_counter <- function() { i <- 0 function() { # do something useful, then ... i <<- i + 1 i } } counter1 <- new_counter() counter2 <- new_counter() counter1() counter1() counter2() ``` ``` ## [1] 1 ## [1] 2 ## [1] 1 ``` .small[based on Stackoverflow [answer](https://stackoverflow.com/questions/2628621/how-do-you-use-scoping-assignment-in-r)] --- name: how_to_write_functions2 ## How to write functions * use **data** as the **very first** argument for `%>%` pipes sake: + `myfun <- function(x, arg)` đ + `myfun <- function(arg, x)` đŻ -- * set arguments to defaults — better too many args than too few: + `myfun <- function(x, seed = 42)` đ + `myfun <- function(x, ...)` đ ââïž -- * remember that global defaults can be changed by `options` -- * if you are re-using functions wtitten by someone else — write a wrapper function around them ```r my_awesome_plot <- function(...) { plot(..., col='red', pch=19, cex.axis=.7, las=1) } my_awesome_plot(1:5) ``` <img src="pres_best_coding_practises_files/figure-html/wrapper-fn-1.png" width="288" style="display: block; margin: auto auto auto 0;" /> --- name: how_to_write_functions3 ## How to write functions * showing progress and messages is good, but let the others turn this functionality off -- * if you are calling other functions, consider using `...` <br><br> -- .center[ <img src="./assets/goto.png" class="fancyimage", style="height:230px; box-shadow:0px 0px 0px white"><br> .vsmall[source: http://www.xkcd/com/292] ] --- name: end-slide class: end-slide <h2 style="color:#fff"> Thank you</h2>