class: center, middle, inverse, title-slide .title[ # Vectors ] .subtitle[ ## R Foundations for Data Analysis ] .author[ ### Marcin Kierczak, Sebastian DiLorenzo, Guilherme Dias ] --- exclude: true count: false <link href="https://fonts.googleapis.com/css?family=Roboto|Source+Sans+Pro:300,400,600|Ubuntu+Mono&subset=latin-ext" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous"> <!-- ------------ Only edit title, subtitle & author above this ------------ --> --- name: contents ## Contents of the lecture - variables and their types - operators - **vectors** - **numbers as vectors** - **strings as vectors** - matrices - data frames - lists <!-- - objects --> - repeating actions: iteration and recursion - decision taking: control structures - functions in general - variable scope - core functions --- name: cplx_data_str ## Complex data structures Using the basic data types (`numeric`, `logical` and `character`) one can construct more complex data structures: <br> <br> -- .pull-left-50[ ![](images/data_structures.png) ] .pull-right-50[ dimensions | Homogenous | Heterogenous ----|------------|----------------- 0 | n/a | n/a 1 | vectors | list 2 | matrices | data frame n | arrays | n/a ] --- name: atomic_vectors ## Atomic vectors An *atomic vector*, or simply a *vector*, is a sequence of elements of the same data type. We build vectors using the function `c()` (combine). ``` r vec <- c(1, 2, 3) vec ``` ``` ## [1] 1 2 3 ``` In R, even a single number is a one-element vector. Get used to think in terms of vectors... --- name: atomic_vectors2 ## Atomic vectors cted. You can also create empty/zero vectors of a given type and length: ``` r vector('integer', 5) # a vector of 5 integers vector('character', 5) character(5) # does the same logical(5) # same as vector('logical', 5) ``` ``` ## [1] 0 0 0 0 0 ## [1] "" "" "" "" "" ## [1] "" "" "" "" "" ## [1] FALSE FALSE FALSE FALSE FALSE ``` --- name: combining_vectors ## Combining two or more vectors Vectors can easily be combined: ``` r v1 <- c(1,2,3) v2 <- c('a','b','c') v3 <- c('do','re','mi') c(v1, v2, v3) ``` ``` ## [1] "1" "2" "3" "a" "b" "c" "do" "re" "mi" ``` Note that after combining numbers with characters, all elements became character. This is called a **coercion**. --- name: basic_vect_arithm ## Basic vector arithmetics We can perform operations on vectors: ``` r v1 <- c(1, 2, 3, 4) v2 <- c(7, -9, 15.2, 4) v1 + v2 # addition v1 - v2 # subtraction v1 * v2 # scalar multiplication v1 / v2 # division ``` ``` ## [1] 8.0 -7.0 18.2 8.0 ## [1] -6.0 11.0 -12.2 0.0 ## [1] 7.0 -18.0 45.6 16.0 ## [1] 0.1428571 -0.2222222 0.1973684 1.0000000 ``` --- name: recycling_rule ## Vectors – recycling rule ``` r v1 <- c(1, 2, 3, 4, 5) v2 <- c(0, 1) v1 + v2 ``` ``` ## [1] 1 3 3 5 5 ``` Values in the shorter vector will be **recycled** (repeated) to match the length of the longer one. In this case, `v2 <- c(0, 1)` becomes `v2 <- c(0, 1, 0, 1, 0)` so that it can be added to v1. --- name: vec_indexing ## Vectors – indexing We can access or retrieve particular elements of a vector by using the [] notation: ``` r vec <- c('a', 'b', 'c', 'd', 'e') vec[1] # the first element vec[5] # the fifth element vec[-1] # remove the first element ``` ``` ## [1] "a" ## [1] "e" ## [1] "b" "c" "d" "e" ``` --- name: vec_indexing2 ## Vectors – indexing cted. And what happens if we want to retrieve elements outside the vector? ``` r vec <- c('a', 'b', 'c', 'd', 'e') vec[0] # R counts elements from 1 vec[10] # Positive index past the length of the vector vec[-6] # Negative index past the length of the vector ``` ``` ## character(0) ## [1] NA ## [1] "a" "b" "c" "d" "e" ``` An index of **zero** will result in an empty vector of the same type as the original vector. A **positive** index beyond the vector's length will result in an `NA` value. A **negative** index beyond the vector's length will result in the full unchanged vector. Basically, R ignores your request. --- name: vec_indexing3 ## Vectors – indexing cted. You can also retrieve elements of a vector using a vector of indices: ``` r vec <- c('a', 'b', 'c', 'd', 'e') vec.ind <- c(1,3,5) vec[vec.ind] ``` ``` ## [1] "a" "c" "e" ``` -- Or even a logical vector: ``` r vec <- c('a', 'b', 'c', 'd', 'e') vec.ind <- c(TRUE, FALSE, TRUE, FALSE, TRUE) vec[vec.ind] ``` ``` ## [1] "a" "c" "e" ``` --- name: vec_indexing_names ## Vectors – indexing using names You can name elements of your vector: ``` r vec <- c(23.7, 54.5, 22.7) names(vec) # by default there are no names names(vec) <- c('sample1', 'sample2', 'sample3') vec vec[c('sample2', 'sample1')] ``` ``` ## NULL ## sample1 sample2 sample3 ## 23.7 54.5 22.7 ## sample2 sample1 ## 54.5 23.7 ``` --- name: vec_rem_elem ## Vectors – removing elements You can return a vector without certain elements: ``` r vec <- c(1, 2, 3, 4, 5) vec[-5] # without the 5-th element vec[-(c(1,3,5))] # without elements 1, 3, 5 ``` ``` ## [1] 1 2 3 4 ## [1] 2 4 ``` --- name: vec_conditions ## Vectors indexing – conditions Also logical expressions are allowed in indexing: ``` r vec <- c(1, 2, 3, 4, 5) vec < 3 # we can use the value of this logical comparison vec[vec < 3]# Et voila! ``` ``` ## [1] TRUE TRUE FALSE FALSE FALSE ## [1] 1 2 ``` --- name: vec_more_ops ## Vectors – more operations You can easily reverse a vector: ``` r vec <- c(1, 2, 3, 4, 5) rev(vec) ``` ``` ## [1] 5 4 3 2 1 ``` You can generate vectors of subsequent numbers using `:`, e.g.: ``` r v <- c(5:7) v v2 <- c(3:-4) v2 ``` ``` ## [1] 5 6 7 ## [1] 3 2 1 0 -1 -2 -3 -4 ``` --- name: vec_size ## Vectors – size To get the size of a vector, use `length()`: ``` r vec <- c(1:78) length(vec) ``` ``` ## [1] 78 ``` --- name: vec_subst_elem ## Vectors – substitute element To substitute an element in a vector simply: ``` r vec <- c(1:5) vec vec[3] <- 'a' # Note the coercion! vec ``` ``` ## [1] 1 2 3 4 5 ## [1] "1" "2" "a" "4" "5" ``` -- To insert 'a' at, say, the 2nd position: ``` r c(vec[1], 'a', vec[2:length(vec)]) ``` ``` ## [1] "1" "a" "2" "a" "4" "5" ``` --- name: vec_alter_len ## Vectors – changing the length What if we write past the vectors last element? ``` r vec <- c(1:5) vec vec[9] <- 9 vec ``` ``` ## [1] 1 2 3 4 5 ## [1] 1 2 3 4 5 NA NA NA 9 ``` --- name: vec_count_vals ## Vectors – counting values One may be interested in the count of particular values: ``` r vec <- c(1:5, 1:4, 1:3) # a vector with repeating values table(vec) # table of counts tab <- table(vec)/length(vec) # table of freqs. round(tab, digits=3) # and let's round it ``` ``` ## vec ## 1 2 3 4 5 ## 3 3 3 2 1 ## vec ## 1 2 3 4 5 ## 0.250 0.250 0.250 0.167 0.083 ``` --- name: vec_sorting ## Vectors – sorting To sort values of a vector: ``` r vec <- c(1:5, NA, NA, 1:3) sort(vec) # oops, NAs got lost sort(vec, na.last = TRUE) sort(vec, decreasing = TRUE) # in a decreasing order ``` ``` ## [1] 1 1 2 2 3 3 4 5 ## [1] 1 1 2 2 3 3 4 5 NA NA ## [1] 5 4 3 3 2 2 1 1 ``` --- name: seq ## Sequences of numbers R provides also a few handy functions to generate sequences of numbers: ``` r c(1:5, 7:10) # the ':' operator seq1 <- seq(from=1, to=10, by=2) seq(from=11, along.with = seq1) seq(from=10, to=1, by=-2) ``` ``` ## [1] 1 2 3 4 5 7 8 9 10 ## [1] 11 12 13 14 15 ## [1] 10 8 6 4 2 ``` --- exclude: true name: printing_brackets ## A detour – printing with `()` Note what we did here, if you enclose the expression in `()`, the result of assignment will be also printed: ``` r seq1 <- seq(from = 1, to = 5) seq1 # has to be printed explicitly ``` ``` ## [1] 1 2 3 4 5 ``` while: -- ``` r (seq2 <- seq(from = 5, to = 1)) # will print automatically ``` ``` ## [1] 5 4 3 2 1 ``` --- name: seq2 ## Repeating sequences One may also wish to repeat a value or a vector n times: ``` r rep('a', times=5) rep(1:5, times=3) rep(seq(from=1, to=3, by=2), times=2) ``` ``` ## [1] "a" "a" "a" "a" "a" ## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 ## [1] 1 3 1 3 ``` --- name: random_seq ## Sequences of random numbers We can use `sample()` to generate sequences of random numbers: ``` r # simulate casting a fair dice 10x sample(x = c(1:6), size=10, replace = T) # make it unfair, it is loaded on '3' myprobs = rep(0.15, times = 6) myprobs[3] <- 0.25 # a bit higher probability for '3' sample(x = c(1:6), size = 10, replace = T, prob=myprobs) ``` ``` ## [1] 4 6 1 3 3 3 6 5 2 5 ## [1] 3 1 5 2 1 2 1 3 1 1 ``` --- name: simulate_dice ## Fair vs. loaded dice Now, let us see how this can be useful. We need more than 10 results. Let's cast our dices 10,000 times and plot the freq. distribution. ``` r # simulate casting a fair dice 10x fair <- sample(x = c(1:6), size=10e3, replace = T) unfair <- sample(x = c(1:6), size=10e3, replace = T, prob = myprobs) ``` --- name: simulate_dice2 ## Fair vs. loaded dice – the result ``` r t1 <- table(fair)/length(fair) t2 <- table(unfair)/length(unfair) plot(0,0,type="n",xlim=c(1,6.0),ylim=c(0,.3),xlab="x",ylab="freq",bty='n', las=1) grid() points(1:6, t1, col="olivedrab") points(1:6, t2, col="slateblue") legend('topleft', legend = c('fair','unfair'), col = c('olivedrab', 'slateblue'),pch = 15, border = NULL, bty='n') ``` <img src="slide_r_elements_2_files/figure-html/dices.pic-1.png" width="504" style="display: block; margin: auto auto auto 0;" /> --- name: more_on_sample ## Sample – one more use The sample function has one more interesting feature, it can be used to randomize order of already created vectors: ``` r mychars <- c('a', 'b', 'c', 'd', 'e', 'f') mychars sample(mychars) sample(mychars) ``` ``` ## [1] "a" "b" "c" "d" "e" "f" ## [1] "c" "a" "d" "f" "e" "b" ## [1] "d" "e" "f" "b" "a" "c" ``` --- name: vec_adv ## Vectors/sequences – more advanced operations ``` r v1 <- sample(1:5, size = 4) v1 max(v1) # max value of the vector min(v1) # min value sum(v1) # sum all the elements ``` ``` ## [1] 4 3 5 1 ## [1] 5 ## [1] 1 ## [1] 13 ``` --- exclude: true name: vec_adv2 ## Vectors/sequences – more advanced operations 2 ``` r v1 diff(v1) # diff. of element pairs cumsum(v1) # cumulative sum prod(v1) # product of all elements ``` ``` ## [1] 4 3 5 1 ## [1] -1 2 -4 ## [1] 4 7 12 13 ## [1] 60 ``` --- name: vec_adv3 ## Vectors/sequences – more advanced operations 3 ``` r v1 cumprod(v1) # cumulative product cummin(v1) # minimum so far (up to i-th el.) cummax(v1) # maximum up to i-th element ``` ``` ## [1] 4 3 5 1 ## [1] 4 12 60 60 ## [1] 4 3 3 1 ## [1] 4 4 5 5 ``` --- exclude: true name: vec_pairwise_comp ## Vectors/sequences – pairwise comparisons ``` r v2 <- sample(1:5, size=4) ``` ``` r v1 v2 v1 <= v2 # direct comparison pmin(v1, v2) # pairwise min pmax(v1, v2) # pairwise max ``` ``` ## [1] 4 3 5 1 ## [1] 2 1 3 4 ## [1] FALSE FALSE FALSE TRUE ## [1] 2 1 3 1 ## [1] 4 3 5 4 ``` --- name: vec_order_rank ## Vectors/sequences – `rank()` and `order()` rank() and order() are a pair of inverse functions. ``` r v1 <- c(1, 3, 4, 5, 3, 2) rank(v1) # show rank of each value (min has rank 1) order(v1) # order of indices for a sorted vector v1[order(v1)] sort(v1) ``` ``` ## [1] 1.0 3.5 5.0 6.0 3.5 2.0 ## [1] 1 6 2 5 3 4 ## [1] 1 2 3 3 4 5 ## [1] 1 2 3 3 4 5 ``` --- name: factors ## Factors To work with **nominal** values, R offers a special data type, a *factor*: ``` r vec <- c('blue', 'yellow', 'purple', 'yellow', 'yellow', 'blue') vec.f <- factor(vec) summary(vec.f) ``` ``` ## blue purple yellow ## 2 1 3 ``` The levels of a factor are coded alphabetically by default. So blue is coded as 1, purple as 2 and yellow as 3. Factors are really just a special type of integer vectors. ``` r as.numeric(vec.f) ``` ``` ## [1] 1 3 2 3 3 1 ``` --- name: factors2 ## Factors You can manually control the coding/mapping of factors and their labels: ``` r vec <- c('blue', 'yellow', 'purple', 'yellow', 'yellow', 'blue') vec.f <- factor(vec, levels=c('blue', 'purple', 'yellow', 'white'), labels=c('sea','flower','sun','snow')) summary(vec.f) ``` ``` ## sea flower sun snow ## 2 1 3 0 ``` --- name: ordered_fac ## Ordered To work with ordinal scale (ordered) variables, one can also use factors: ``` r vec <- c('small', 'tiny', 'large', 'medium') factor(vec) # rearranged alphabetically ``` ``` ## [1] small tiny large medium ## Levels: large medium small tiny ``` -- We can control the order: ``` r factor(vec, levels = c('tiny', 'small', 'medium', 'large'), ordered=TRUE) # ordered as provided in the levels argument ``` ``` ## [1] small tiny large medium ## Levels: tiny < small < medium < large ``` <!-- --------------------- Do not edit this and below --------------------- --> --- name: end_slide class: end-slide, middle count: false # We will talk about matrices in the next lecture! .end-text[ <p class="smaller"> <span class="small" style="line-height: 1.2;">Graphics from </span><img src="./assets/freepik.jpg" style="max-height:20px; vertical-align:middle;"><br> Created: 31-Oct-2024 • <a href="https://www.scilifelab.se/">SciLifeLab</a> • <a href="https://nbis.se/">NBIS</a> </p> ]