Vectors

class: center, middle, inverse, title-slide

.title[
# Vectors
]
.subtitle[
## R Foundations for Data Analysis
]
.author[
### Marcin Kierczak, Sebastian DiLorenzo, Guilherme Dias
]

---

exclude: true
count: false

---
name: contents

## Contents of the lecture

- variables and their types
- operators
- **vectors**
- **numbers as vectors**
- **strings as vectors**
- matrices
- data frames
- lists

- repeating actions: iteration and recursion
- decision taking: control structures
- functions in general
- variable scope
- core functions

---
name: cplx_data_str

## Complex data structures

Using the basic data types (`numeric`, `logical` and `character`) one can construct more complex data structures:

--

.pull-left-50[
![](images/data_structures.png)
]

.pull-right-50[

dimensions | Homogenous | Heterogenous
----|------------|-----------------
0  | n/a        | n/a
1  | vectors    | list
2  | matrices   | data frame
n  | arrays     | n/a

]
---
name: atomic_vectors

## Atomic vectors
An *atomic vector*, or simply a *vector*, is a sequence of elements of the same data type.

We build vectors using the function `c()` (combine).

``` r
vec <- c(1, 2, 3)
vec
```

```
## [1] 1 2 3
```
In R, even a single number is a one-element vector. Get used to think in terms of vectors...

---
name: atomic_vectors2

## Atomic vectors cted.
You can also create empty/zero vectors of a given type and length:

``` r
vector('integer', 5) # a vector of 5 integers
vector('character', 5)
character(5) # does the same
logical(5) # same as vector('logical', 5)
```

```
## [1] 0 0 0 0 0
## [1] "" "" "" "" ""
## [1] "" "" "" "" ""
## [1] FALSE FALSE FALSE FALSE FALSE
```

---
name: combining_vectors

## Combining two or more vectors
Vectors can easily be combined:

``` r
v1 <- c(1,2,3)
v2 <- c('a','b','c')
v3 <- c('do','re','mi')
c(v1, v2, v3)
```

```
## [1] "1"  "2"  "3"  "a"  "b"  "c"  "do" "re" "mi"
```
Note that after combining numbers with characters, all elements became character.

This is called a **coercion**.

---
name: basic_vect_arithm

## Basic vector arithmetics
We can perform operations on vectors:

``` r
v1 <- c(1, 2, 3, 4)
v2 <- c(7, -9, 15.2, 4)
v1 + v2 # addition
v1 - v2 # subtraction
v1 * v2 # scalar multiplication
v1 / v2 # division
```

```
## [1]  8.0 -7.0 18.2  8.0
## [1]  -6.0  11.0 -12.2   0.0
## [1]   7.0 -18.0  45.6  16.0
## [1]  0.1428571 -0.2222222  0.1973684  1.0000000
```

---
name: recycling_rule

## Vectors &ndash; recycling rule

``` r
v1 <- c(1, 2, 3, 4, 5)
v2 <- c(0, 1)
v1 + v2
```

```
## [1] 1 3 3 5 5
```
Values in the shorter vector will be **recycled** (repeated) to match the length of the longer one.

In this case, `v2 <- c(0, 1)` becomes `v2 <- c(0, 1, 0, 1, 0)` so that it can be added to v1.

---
name: vec_indexing

## Vectors &ndash; indexing
We can access or retrieve particular elements of a vector by using the [] notation:

``` r
vec <- c('a', 'b', 'c', 'd', 'e')
vec[1] # the first element
vec[5] # the fifth element
vec[-1] # remove the first element
```

```
## [1] "a"
## [1] "e"
## [1] "b" "c" "d" "e"
```

---
name: vec_indexing2

## Vectors &ndash; indexing cted.
And what happens if we want to retrieve elements outside the vector?

``` r
vec <- c('a', 'b', 'c', 'd', 'e')
vec[0] # R counts elements from 1
vec[10] # Positive index past the length of the vector
vec[-6] # Negative index past the length of the vector
```

```
## character(0)
## [1] NA
## [1] "a" "b" "c" "d" "e"
```
An index of **zero** will result in an empty vector of the same type as the original vector.

A **positive** index beyond the vector's length will result in an `NA` value.

A **negative** index beyond the vector's length will result in the full unchanged vector. Basically, R ignores your request.

---
name: vec_indexing3

## Vectors &ndash; indexing cted.
You can also retrieve elements of a vector using a vector of indices:

``` r
vec <- c('a', 'b', 'c', 'd', 'e')
vec.ind <- c(1,3,5)
vec[vec.ind]
```

```
## [1] "a" "c" "e"
```

Or even a logical vector:

``` r
vec <- c('a', 'b', 'c', 'd', 'e')
vec.ind <- c(TRUE, FALSE, TRUE, FALSE, TRUE)
vec[vec.ind]
```

```
## [1] "a" "c" "e"
```

---
name: vec_indexing_names

## Vectors &ndash; indexing using names
You can name elements of your vector:

``` r
vec <- c(23.7, 54.5, 22.7)
names(vec) # by default there are no names
names(vec) <- c('sample1', 'sample2', 'sample3')
vec
vec[c('sample2', 'sample1')]
```

```
## NULL
## sample1 sample2 sample3 
##    23.7    54.5    22.7 
## sample2 sample1 
##    54.5    23.7
```

---
name: vec_rem_elem

## Vectors &ndash; removing elements
You can return a vector without certain elements:

``` r
vec <- c(1, 2, 3, 4, 5)
vec[-5] # without the 5-th element
vec[-(c(1,3,5))] # without elements 1, 3, 5
```

```
## [1] 1 2 3 4
## [1] 2 4
```

---
name: vec_conditions

## Vectors indexing &ndash; conditions
Also logical expressions are allowed in indexing:

``` r
vec <- c(1, 2, 3, 4, 5)
vec < 3 # we can use the value of this logical comparison
vec[vec < 3]# Et voila!
```

```
## [1]  TRUE  TRUE FALSE FALSE FALSE
## [1] 1 2
```

---
name: vec_more_ops

## Vectors &ndash; more operations
You can easily reverse a vector:

``` r
vec <- c(1, 2, 3, 4, 5)
rev(vec)
```

```
## [1] 5 4 3 2 1
```
You can generate vectors of subsequent numbers using `:`, e.g.:

``` r
v <- c(5:7)
v
v2 <- c(3:-4)
v2
```

```
## [1] 5 6 7
## [1]  3  2  1  0 -1 -2 -3 -4
```

---
name: vec_size

## Vectors &ndash; size
To get the size of a vector, use `length()`:

``` r
vec <- c(1:78)
length(vec)
```

```
## [1] 78
```

---
name: vec_subst_elem

## Vectors &ndash; substitute element
To substitute an element in a vector simply:

``` r
vec <- c(1:5)
vec
vec[3] <- 'a' # Note the coercion!
vec 
```

```
## [1] 1 2 3 4 5
## [1] "1" "2" "a" "4" "5"
```

To insert 'a' at, say, the 2nd position:

``` r
c(vec[1], 'a', vec[2:length(vec)])
```

```
## [1] "1" "a" "2" "a" "4" "5"
```

---
name: vec_alter_len

## Vectors &ndash; changing the length
What if we write past the vectors last element?

``` r
vec <- c(1:5)
vec
vec[9] <- 9
vec 
```

```
## [1] 1 2 3 4 5
## [1]  1  2  3  4  5 NA NA NA  9
```

---
name: vec_count_vals

## Vectors &ndash; counting values
One may be interested in the count of particular values:

``` r
vec <- c(1:5, 1:4, 1:3) # a vector with repeating values
table(vec) # table of counts
tab <- table(vec)/length(vec) # table of freqs.
round(tab, digits=3) # and let's round it
```

```
## vec
## 1 2 3 4 5 
## 3 3 3 2 1 
## vec
##     1     2     3     4     5 
## 0.250 0.250 0.250 0.167 0.083
```

---
name: vec_sorting

## Vectors &ndash; sorting
To sort values of a vector:

``` r
vec <- c(1:5, NA, NA, 1:3)
sort(vec) # oops, NAs got lost
sort(vec, na.last = TRUE)
sort(vec, decreasing = TRUE) # in a decreasing order
```

```
## [1] 1 1 2 2 3 3 4 5
##  [1]  1  1  2  2  3  3  4  5 NA NA
## [1] 5 4 3 3 2 2 1 1
```

---
name: seq

## Sequences of numbers
R provides also a few handy functions to generate sequences of numbers:

``` r
c(1:5, 7:10) # the ':' operator
seq1 <- seq(from=1, to=10, by=2)
seq(from=11, along.with = seq1)
seq(from=10, to=1, by=-2)
```

```
## [1]  1  2  3  4  5  7  8  9 10
## [1] 11 12 13 14 15
## [1] 10  8  6  4  2
```

---
exclude: true
name: printing_brackets

## A detour &ndash; printing with `()`
Note what we did here, if you enclose the expression in `()`, the result of assignment will be also printed:

``` r
seq1 <- seq(from = 1, to = 5)
seq1 # has to be printed explicitly
```

```
## [1] 1 2 3 4 5
```
while:

``` r
(seq2 <- seq(from = 5, to = 1)) # will print automatically
```

```
## [1] 5 4 3 2 1
```

---
name: seq2

## Repeating sequences
One may also wish to repeat a value or a vector n times:

``` r
rep('a', times=5)
rep(1:5, times=3)
rep(seq(from=1, to=3, by=2), times=2)
```

```
## [1] "a" "a" "a" "a" "a"
##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
## [1] 1 3 1 3
```

---
name: random_seq

## Sequences of random numbers
We can use `sample()` to generate sequences of random numbers:

``` r
# simulate casting a fair dice 10x
sample(x = c(1:6), size=10, replace = T) 
# make it unfair, it is loaded on '3'
myprobs = rep(0.15, times = 6)
myprobs[3] <- 0.25 # a bit higher probability for '3'
sample(x = c(1:6), size = 10, replace = T, prob=myprobs)
```

```
##  [1] 4 6 1 3 3 3 6 5 2 5
##  [1] 3 1 5 2 1 2 1 3 1 1
```

---
name: simulate_dice

## Fair vs. loaded dice
Now, let us see how this can be useful. We need more than 10 results. Let's cast our dices 10,000 times and plot the freq. distribution.

``` r
# simulate casting a fair dice 10x
fair <- sample(x = c(1:6), size=10e3, replace = T) 
unfair <- sample(x = c(1:6), size=10e3, replace = T, prob = myprobs)
```

---
name: simulate_dice2

## Fair vs. loaded dice &ndash; the result

``` r
t1 <- table(fair)/length(fair)
t2 <- table(unfair)/length(unfair)
plot(0,0,type="n",xlim=c(1,6.0),ylim=c(0,.3),xlab="x",ylab="freq",bty='n', las=1)
grid()
points(1:6, t1, col="olivedrab")
points(1:6, t2, col="slateblue")
legend('topleft', legend = c('fair','unfair'), col = c('olivedrab', 'slateblue'),pch = 15, border = NULL, bty='n')
```

---
name: more_on_sample

## Sample &ndash; one more use
The sample function has one more interesting feature, it can be used to randomize order of already created vectors:

``` r
mychars <- c('a', 'b', 'c', 'd', 'e', 'f')
mychars
sample(mychars)
sample(mychars)
```

```
## [1] "a" "b" "c" "d" "e" "f"
## [1] "c" "a" "d" "f" "e" "b"
## [1] "d" "e" "f" "b" "a" "c"
```

---
name: vec_adv

## Vectors/sequences &ndash; more advanced operations

``` r
v1 <- sample(1:5, size = 4)
v1
max(v1) # max value of the vector
min(v1) # min value
sum(v1) # sum all the elements
```

```
## [1] 4 3 5 1
## [1] 5
## [1] 1
## [1] 13
```

---
exclude: true
name: vec_adv2

## Vectors/sequences &ndash; more advanced operations 2

``` r
v1
diff(v1) # diff. of element pairs
cumsum(v1) # cumulative sum
prod(v1) # product of all elements
```

```
## [1] 4 3 5 1
## [1] -1  2 -4
## [1]  4  7 12 13
## [1] 60
```

---
name: vec_adv3

## Vectors/sequences &ndash; more advanced operations 3

``` r
v1
cumprod(v1) # cumulative product
cummin(v1) # minimum so far (up to i-th el.)
cummax(v1) # maximum up to i-th element
```

```
## [1] 4 3 5 1
## [1]  4 12 60 60
## [1] 4 3 3 1
## [1] 4 4 5 5
```

---
exclude: true
name: vec_pairwise_comp

## Vectors/sequences &ndash; pairwise comparisons

``` r
v2 <- sample(1:5, size=4)
```

``` r
v1
v2
v1 <= v2 # direct comparison
pmin(v1, v2) # pairwise min
pmax(v1, v2) # pairwise max
```

```
## [1] 4 3 5 1
## [1] 2 1 3 4
## [1] FALSE FALSE FALSE  TRUE
## [1] 2 1 3 1
## [1] 4 3 5 4
```

---
name: vec_order_rank

## Vectors/sequences &ndash; `rank()` and `order()`
rank() and order() are a pair of inverse functions.

``` r
v1 <- c(1, 3, 4, 5, 3, 2)
rank(v1) # show rank of each value (min has rank 1)
order(v1) # order of indices for a sorted vector
v1[order(v1)]
sort(v1)
```

```
## [1] 1.0 3.5 5.0 6.0 3.5 2.0
## [1] 1 6 2 5 3 4
## [1] 1 2 3 3 4 5
## [1] 1 2 3 3 4 5
```

---
name: factors

## Factors
To work with **nominal** values, R offers a special data type, a *factor*:

``` r
vec <- c('blue', 'yellow', 'purple', 
 'yellow', 'yellow', 'blue')
vec.f <- factor(vec)
summary(vec.f)
```

```
##   blue purple yellow 
##      2      1      3
```
The levels of a factor are coded alphabetically by default. So blue is coded as 1, purple as 2 and yellow as 3.

Factors are really just a special type of integer vectors.

``` r
as.numeric(vec.f)
```

```
## [1] 1 3 2 3 3 1
```

---
name: factors2

## Factors
You can manually control the coding/mapping of factors and their labels:

``` r
vec <- c('blue', 'yellow', 'purple', 
 'yellow', 'yellow', 'blue')
vec.f <- factor(vec, levels=c('blue', 'purple', 'yellow', 'white'), 
 labels=c('sea','flower','sun','snow'))
summary(vec.f)
```

```
##    sea flower    sun   snow 
##      2      1      3      0
```

---
name: ordered_fac

## Ordered
To work with ordinal scale (ordered) variables, one can also use factors:

``` r
vec <- c('small', 'tiny', 'large', 'medium')
factor(vec) # rearranged alphabetically
```

```
## [1] small  tiny   large  medium
## Levels: large medium small tiny
```
--
We can control the order:

``` r
factor(vec, levels = c('tiny', 'small', 'medium', 'large'),
       ordered=TRUE) # ordered as provided in the levels argument
```

```
## [1] small tiny large medium
## Levels: tiny < small < medium < large
```

---
name: end_slide
class: end-slide, middle
count: false

# We will talk about matrices in the next lecture!

.end-text[

Graphics from <img src="./assets/freepik.jpg" style="max-height:20px; vertical-align:middle;"> 
Created: 31-Oct-2024 • <a href="https://www.scilifelab.se/">SciLifeLab</a> • <a href="https://nbis.se/">NBIS</a> 

]