class: center, middle, inverse, title-slide .title[ # Introduction to R ] .subtitle[ ## R Foundations for Data Analysis ] .author[ ### Marcin Kierczak and Nima Rafati ] --- exclude: true count: false <link href="https://fonts.googleapis.com/css?family=Roboto|Source+Sans+Pro:300,400,600|Ubuntu+Mono&subset=latin-ext" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous"> <!-- ------------ Only edit title, subtitle & author above this ------------ --> --- name: content class: spaced # Contents * [Why R](#whyR) * [About R](#about) * [Pros and cons of R](#pros_and_cons) * [Ecosystem of packages](#num_packages) * [Programming language](#programming_language) * [Packages](#packages) * [Package installation](#pkg_cran_inst) --- name: whyR # WHY R? .pull-left-50[  ] # With the power of R, your project can ARISE: -- **A**utomative: Streamline your analysis -- **R**eproducible: Reproduce the results -- **I**nterpretable: Make sense -- **S**hareable: Share with colleagues/mentors (FAIR) -- **E**xplainable: Easy to explain --- name: about # Briefly about R .pull-left-50[ # R is ... .pull-right-30[  ] * a programming language * a software project driven by the core team and the community * a very powerful tool for statistical computing * a very powerful computational tool in general ] -- .pull-right-50[ # R is not ... * a tool to replace a statistician * the very best programming language * the most elegant programming solution * the most efficient programming language ] --- # Programing Language -- > Programming is a process of instructing a computer to perform a specific task. We write these instructions by **programming language**. It can be as simple as calculation (like a calculator) or complex applications. -- * flow of _data_ -- * Data is collected information which qualitatively and/or quantitatively describe an entity. -- * Data is collected from quite diverse sources (data types). -- * Data processing. -- * Data cleaning. -- <img src="data/slide_programming/Data_Information_Knowledge.png" width="75%" style="display: block; margin: auto;" /> --- # Programing Language cted. -- * from one _function_ to another -- * Function is a **reusable** chunk of code that performs a task. It takes **inputs** as well as **arguments** to process. -- * each function does something to the data and return output(s) -- * For example `mean()`, `min()` -- --- # Three things to think about 1- what *types* of data can I process? -- 2- how do I *write* what I want? -- 3- when does it *mean* anything? --- # 1- Data type <img src="data/slide_programming/Data_classification.png" width="75%" style="display: block; margin: auto;" /> -- * int — 1 2 5 9 * double — 1.23 -5.74 * char — a b test 7 9 * logical — TRUE/FALSE (T/F) --- # 2- How to write? * By language *grammar* — *syntax*. -- `2 * 1 + 1` vs. `(+ (* 2 1) 1)` --- # 3- What does it *mean* * By language *semantics* -- * *Colorful yellow train sleeps on a crazy wave.* — has no generally accepted meaning * *There is $500 on his empty bank acount.* — internal contradiction ``` r height <- 180 #similar to height = 180 weight <- 70 #similar to weight = 70 bmi <- weight / height ``` --- name: topic2 # Where to start? *Divide et impera* — divide and rule. **Top-down approach:** define the big problem and split it into smaller ones. Assume you have solution to the small problems and continue — push the responsibility down. Wishful thinking! ``` r # Set up blank plot plot(0, 0, type = "n", xlim = c(-4, 4), ylim = c(-5, 1), axes = FALSE, xlab = "", ylab = "", main = "Top Down Approach") # Function to draw circle node draw_node <- function(x, y, r = 0.3, col = "darkorange") { symbols(x, y, circles = r, inches = FALSE, add = TRUE, fg = col, bg = "white", lwd = 2) } # Coordinates of nodes root <- c(0, 0) left_child <- c(-2, -2) right_child <- c(2, -2) left_left <- c(-3, -4) left_right <- c(-1, -4) right_left <- c(1, -4) right_right <- c(3, -4) # Draw edges segments(root[1], root[2], left_child[1], left_child[2], col = "darkorange", lwd = 2) segments(root[1], root[2], right_child[1], right_child[2], col = "darkorange", lwd = 2) segments(left_child[1], left_child[2], left_left[1], left_left[2], col = "darkorange", lwd = 2) segments(left_child[1], left_child[2], left_right[1], left_right[2], col = "darkorange", lwd = 2) segments(right_child[1], right_child[2], right_left[1], right_left[2], col = "darkorange", lwd = 2) segments(right_child[1], right_child[2], right_right[1], right_right[2], col = "darkorange", lwd = 2) # Draw nodes draw_node(root[1], root[2]) draw_node(left_child[1], left_child[2]) draw_node(right_child[1], right_child[2]) draw_node(left_left[1], left_left[2]) draw_node(left_right[1], left_right[2]) draw_node(right_left[1], right_left[2]) draw_node(right_right[1], right_right[2]) # Add title in the same style title(main = "Top Down Approach", col.main = "darkorange", font.main = 2) ``` <img src="slide_r_intro_files/figure-html/top-down-approach-fig-1.png" width="504" style="display: block; margin: auto auto auto 0;" /> --- name: packages # Packages .pull-right-50[ <img src="data/slide_intro/packages.jpg" width="250pt" style="display: block; margin: auto;" /> ] .pull-right-50[ <img src="images/tidyverse-icons.png" width="250pt" style="display: block; margin: auto;" /> ] -- * ready-made functions (data & docs) -- * developed by the community -- * cover several very diverse areas of science/life -- * uniformly structured and documented -- * organised in repositiries: + [CRAN](https://cran.r-project.org) + [R-Forge](https://r-forge.r-project.org) + [Bioconductor](http://www.bioconductor.org) + [GitHub](https://github.com) --- name: CRAN -- # Working with packages -- CRAN example. <img src="data/slide_r_environment/ggplot2_CRAN.png" width="80%" style="display: block; margin: auto;" /> --- name: pkg_cran_inst # Working with packages -- installation Only a few packages are pre-installed: ``` r library(modelr) ``` In order to install a package from command line, use: ``` r install.packages("ggplot2",dependencies=TRUE) ``` --- name: work_pkg_details # Working with packages -- details It may happen that you want to also specify the repository, e.g. because it is geographically closer to you or because your default mirror is down: ``` r install.packages('ggplot2',dependencies=TRUE,repos="http://cran.se.r-project.org") ``` But, sometimes, this does not work either because the package is not available for your platform. In such case, you need to *compile* it from its *source code*. --- name: work_pkg_details2 # Working with packages -- details cted. <img src="data/slide_r_environment/ggplot2_CRAN.png" width="150%" style="display: block; margin: auto;" /> --- name: source_pkg_inst # Working with packages -- installing from source. - Download the source file, in our example *ggplot2_3.4.3.tar.gz*. - Install it: ``` r install.packages("path/to/ggplot2_3.4.3.tar.gz", repos=NULL, type='source', dependencies=TRUE) ``` - Load it: ``` r library('ggplot2') # always forces reloading require('ggplot2') # load only if not already loaded ``` --- name: pkg_github # Packages -- GitHub Nowadays, more and more developers distribute their packages via GitHub. The easiest way to install packages from the GitHub is via the *devtools* package: - Install the *devtools* package - Load it - Install ``` r install.packages('devtools',dependencies=TRUE) library('devtools') install_github('talgalili/installr') # Github username/repo ``` -- .pull-center-50[ <img src="images/installr.png" width="750pt" style="display: block; margin: auto;" /> ] --- name: pkg_bioconductor # Packages -- Bioconductor <img src="data/slide_r_environment/logo_bioconductor.png" width="200pt" style="display: block; margin: auto;" /> First install Bioconductor Manager: ``` r if (!requireNamespace("BiocManager",quietly = TRUE)) install.packages("BiocManager") ``` --- name: pkg_bioconductor2 # Packages -- Bioconductor cted. Now, you can install particular packages from Bioconductor: ``` r BiocManager::install("GenomicRanges") ``` .pull-center-50[ <img src="images/GenomicRanges.png" width="550pt" style="display: block; margin: auto;" /> ] For more info, visit [Bioconductor website](http://www.bioconductor.org/install/). --- # One package to rule them all -- the magic of `renv` ✨ One package to rule them all — the magic of renv ✨ - Start once → `renv::init()` (sets up your project’s toolbox) - While working → `renv::snapshot()` (save the toolbox state) - Share with a friend → send the `renv.lock` file - Friend restores → `renv::restore()` (they get the exact same toolbox) .pull-center-50[ <img src="images/renv.png" width="550pt" style="display: block; margin: auto;" /> ] <!-- --------------------- Do not edit this and below --------------------- --> --- name: end_slide class: end-slide, middle count: false # Thank you! Questions? .end-text[ <p class="smaller"> <span class="small" style="line-height: 1.2;">Graphics from </span><img src="./assets/freepik.jpg" style="max-height:20px; vertical-align:middle;"><br> Created: 07-Oct-2025 • <a href="https://www.scilifelab.se/">SciLifeLab</a> • <a href="https://nbis.se/">NBIS</a> </p> ]