Tidy work in Tidyverse

class: center, middle, inverse, title-slide

.title[
# Tidy work in Tidyverse
]
.subtitle[
## R Foundation for Life Scientists
]
.author[
### Marcin Kierczak
]

---

exclude: true
count: false

---
name: setup_livecode
# Livecode Setup

By typing:

`http://livecode.kierczak.net:7777`

in your browser, you can access the livecode server.

---
name: learning_outcomes
# Learning Outcomes

<br>

Upon completing this module, you will:

* know what `tidyverse` is and a bit about its history

* be aware of useful packages within `tidyverse`

* be able to use basic pipes (including native R pipe)

* know whether the data you are working with are tidy

* will be able to do basic tidying of your data

---
name: tidyverse_overview
# Tidyverse -- What is it all About?

* [tidyverse](http://www.tidyverse.org) is a collection of &nbsp; 📦 📦
* created by [Hadley Wickham](http://hadley.nz)
* has become a *de facto* standard in data analyses
* a philosophy of programming or a **programming paradigm**: everything is about the &nbsp;🌊  &nbsp; flow of &nbsp; 🧹 &nbsp; tidy data

.center[
<img src="data/slide_tidyverse/hex-tidyverse.png", style="height:200px;">
<img src="data/slide_tidyverse/Hadley-wickham2016-02-04.jpeg", style="height:200px;">
<img src="data/slide_tidyverse/RforDataScience.jpeg", style="height:200px;">
]
.vsmall[sources of images: www.tidyverse.org, Wikipedia, www.tidyverse.org]

---
name: tidyverse_curse
# ?(Tidyverse OR !Tidyverse)

> ☠️ &nbsp;There are still some people out there talking about the tidyverse curse though...&nbsp; ☠️<br>

> Navigating the balance between base R and the tidyverse is a challenge to learn.<br>[-Robert A. Muenchen](http://r4stats.com/articles/why-r-is-hard-to-learn/)

.center[<img src="data/slide_tidyverse/tidyverse-flow.png", style="height:400px;">]

.vsmall[source: http://www.storybench.org/getting-started-with-tidyverse-in-r/]

---
name: intro_to_pipes
# Pipes or Let my Data Flow &nbsp; 🌊

.pull-left-50[

.center[<img src="data/slide_tidyverse/pipe_magritte.jpg", style="width:300px;">]

.vsmall[Rene Magritt, *La trahison des images*, [Wikimedia Commons](https://en.wikipedia.org/wiki/The_Treachery_of_Images#/media/File:MagrittePipe.jpg)]

.center[<img src="data/slide_tidyverse/magrittr.png", style="width:150px;">]
]

.pull-right-50[

* Let the data flow.
* *Ceci n'est pas une pipe* -- `magrittr`
* The `%>%` pipe:
  + `x %>% f` `$\equiv$` `f(x)`
  + `x %>% f(y)` `$\equiv$` `f(x, y)`
  + `x %>% f %>% g %>% h` `$\equiv$` `h(g(f(x)))`

]

.pull-right-50[

instead of writing this:

``` r
data <- iris
data <- head(data, n=3)
```

]

.pull-right-50[

write this:

``` r
iris %>% head(n=3)
```

```
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
```

]

---
name: native_r_pipe
# Native R Pipe

From R 4.1.0, we have a native pipe operator `|>` that is a bit faster than the `magrittr` pipe `%>%`.
It, however, differs from the `magrittr` pipe in some aspects, e.g., it does not allow for the use of the dot `.` as a placeholder (it has a simple `_` placeholder though).

``` r
c(1:5) |> mean()
```

```
## [1] 3
```

``` r
c(1:5) %>% mean()
```

```
## [1] 3
```

---
name: tibble_intro

# Tibbles

.pull-left-50[

.center[<img src="data/slide_tidyverse/tibble_tweet.jpg">]
]

.pull-right-50[

* `tibble` is one of the unifying features of tidyverse,
* it is a *better* `data.frame` realization,
* objects `data.frame` can be coerced to `tibble` using `as_tibble()`
]

---
name: convert_to_tibble
# Convert `data.frame` to `tibble`

``` r
as_tibble(iris)
```

```
## # A tibble: 150 × 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## 11          5.4         3.7          1.5         0.2 setosa 
## 12          4.8         3.4          1.6         0.2 setosa 
## 13          4.8         3            1.4         0.1 setosa 
## 14          4.3         3            1.1         0.1 setosa 
## 15          5.8         4            1.2         0.2 setosa 
## # ℹ 135 more rows
```

---
name: tibble_from_scratch
# Tibbles from scratch with `tibble()`

``` r
tibble(
  x = 1,          # recycling
  y = runif(4),
  z = x + y^2,
  outcome = rnorm(4)
)
```

```
## # A tibble: 4 × 4
##       x      y     z outcome
##   <dbl>  <dbl> <dbl>   <dbl>
## 1     1 0.0861  1.01   2.04 
## 2     1 0.897   1.80  -0.527
## 3     1 0.179   1.03   1.14 
## 4     1 0.694   1.48   0.485
```

---
name: more_on_tibbles
# More on Tibbles

* When you print a `tibble`:
  + all columns that fit the screen are shown,
  + first 10 rows are shown,
  + data type for each column is shown.

``` r
as_tibble(cars)
```

```
## # A tibble: 50 × 2
##    speed  dist
##    <dbl> <dbl>
##  1     4     2
##  2     4    10
##  3     7     4
##  4     7    22
##  5     8    16
##  6     9    10
##  7    10    18
##  8    10    26
##  9    10    34
## 10    11    17
## 11    11    28
## 12    12    14
## 13    12    20
## 14    12    24
## 15    12    28
## # ℹ 35 more rows
```

---
name: tibble_printing_options
# Tibble Printing Options

* `my_tibble %>% print(n = 50, width = Inf)`,
* `options(tibble.print_min = 15, tibble.print_max = 25)`,
* `options(dplyr.print_min = Inf)`,
* `options(tibble.width = Inf)`

---
name: subsetting_tibbles
# Subsetting Tibbles

``` r
vehicles <- as_tibble(cars[1:5,])
vehicles %>% print(n = 5)
```

```
## # A tibble: 5 × 2
##   speed  dist
##   <dbl> <dbl>
## 1     4     2
## 2     4    10
## 3     7     4
## 4     7    22
## 5     8    16
```
  
  
--

We can subset tibbles in a number of ways:

``` r
vehicles[['speed']] # try also vehicles['speed']
vehicles[[1]]
vehicles$speed
```

```
## [1] 4 4 7 7 8
## [1] 4 4 7 7 8
## [1] 4 4 7 7 8
```
  
  
--

> **Note!** Not all old R functions work with tibbles, than you have to use `as.data.frame(my_tibble)`.

---
name: tibbles_partial_matching

# Tibbles are Stricter than `data.frames`

``` r
cars <- cars[1:5,]
```

``` r
cars$spe      # partial matching
```

```
## [1] 4 4 7 7 8
```

``` r
vehicles$spe  # no partial matching
```

```
## Warning: Unknown or uninitialised column: `spe`.
```

```
## NULL
```

``` r
cars$gear
```

```
## NULL
```

``` r
vehicles$gear
```

```
## Warning: Unknown or uninitialised column: `gear`.
```

```
## NULL
```

---
name: loading_data

# Loading Data

In `tidyverse` you import data using `readr` package that provides a number of useful data import functions:
* `read_delim()` a generic function for reading *-delimited files. There are a number of convenience wrappers:
  + `read_csv()` used to read comma-delimited files,
  + `read_csv2()` reads semicolon-delimited files,
  `read_tsv()` that reads tab-delimited files.
* `read_fwf` for reading fixed-width files with its wrappers:
  + fwf_widths() for width-based reading,
  + fwf_positions() for positions-based reading and
  + read_table() for reading white space-delimited fixed-width files.
* `read_log()` for reading Apache-style logs.

>The most commonly used `read_csv()` has some familiar arguments like:
* `skip` -- to specify the number of rows to skip (headers),
* `col_names` -- to supply a vector of column names,
* `comment` -- to specify what character designates a comment,
* `na` -- to specify how missing values are represented.

---
name: readr_writing

# Writing to a File

The `readr` package also provides functions useful for writing tibbled data into a file:

* `write_csv()`
* `write_tsv()`
* `write_excel_csv()`

They **always** save:

* text in UTF-8,
* dates in ISO8601

But saving in csv (or tsv) does mean you loose information about the type of data in particular columns. You can avoid this by using:

* `write_rds()` and `read_rds()` to read/write objects in R binary rds format,
* use `write_feather()` and `read_feather()` from package `feather` to read/write objects in a fast binary format that other programming languages can access.

---
name: basic_data_transformations

# Basic Data Transformations with `dplyr`

Let us create a tibble:

``` r
bijou <- as_tibble(diamonds) %>% head()
bijou[1:5, ]
```

```
## # A tibble: 5 × 10
##   carat cut     color clarity depth table price     x     y     z
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal   E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.21 Premium E     SI1      59.8    61   326  3.89  3.84  2.31
## 3  0.23 Good    E     VS1      56.9    65   327  4.05  4.07  2.31
## 4  0.29 Premium I     VS2      62.4    58   334  4.2   4.23  2.63
## 5  0.31 Good    J     SI2      63.3    58   335  4.34  4.35  2.75
```

.center[ <img src="data/slide_tidyverse/diamonds.png", style="height:200px"> ]

---
name: filter

# Picking Observations using `filter()`

``` r
bijou %>% filter(cut == 'Ideal' | cut == 'Premium', carat >= 0.23) %>% head(n = 4)
```

```
## # A tibble: 2 × 10
##   carat cut     color clarity depth table price     x     y     z
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal   E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.29 Premium I     VS2      62.4    58   334  4.2   4.23  2.63
```

>⛵ &nbsp; Be careful with floating point comparisons! <br>
🦜 &nbsp; Also, rows with comparison resulting in `NA` are skipped by default!

``` r
bijou %>% filter(near(0.23, carat) | is.na(carat)) %>% head(n = 4)
```

```
## # A tibble: 2 × 10
##   carat cut   color clarity depth table price     x     y     z
##   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.23 Good  E     VS1      56.9    65   327  4.05  4.07  2.31
```
  
---
name: arrange

# Rearranging Observations using `arrange()`

``` r
bijou %>% arrange(cut, carat, desc(price))
```
  
--

```
## # A tibble: 6 × 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 2  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 3  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
## 4  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 5  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 6  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
```
  
--

>The `NA`s always end up at the end of the rearranged `tibble`!

---
name: select

# Selecting Variables with `select()`

Simple `select` with a range:

``` r
bijou %>% select(color, clarity, x:z) %>% head(n = 4)
```

```
## # A tibble: 4 × 5
##   color clarity     x     y     z
##   <ord> <ord>   <dbl> <dbl> <dbl>
## 1 E     SI2      3.95  3.98  2.43
## 2 E     SI1      3.89  3.84  2.31
## 3 E     VS1      4.05  4.07  2.31
## 4 I     VS2      4.2   4.23  2.63
```

Exclusive `select`:

``` r
bijou %>% select(-(x:z)) %>% head(n = 4)
```

```
## # A tibble: 4 × 7
##   carat cut     color clarity depth table price
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int>
## 1  0.23 Ideal   E     SI2      61.5    55   326
## 2  0.21 Premium E     SI1      59.8    61   326
## 3  0.23 Good    E     VS1      56.9    65   327
## 4  0.29 Premium I     VS2      62.4    58   334
```

---
name: rename
# Renaming Variables

>`rename` is a variant of `select`, here used with `everything()` to move `x` to the beginning and rename it to `var_x`

``` r
bijou %>% rename(var_x = x) %>% head(n = 5)
```
  
--

``` r
bijou %>% rename(var_x = x) %>% head(n = 5)
```

```
## # A tibble: 5 × 10
##   carat cut     color clarity depth table price var_x     y     z
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal   E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.21 Premium E     SI1      59.8    61   326  3.89  3.84  2.31
## 3  0.23 Good    E     VS1      56.9    65   327  4.05  4.07  2.31
## 4  0.29 Premium I     VS2      62.4    58   334  4.2   4.23  2.63
## 5  0.31 Good    J     SI2      63.3    58   335  4.34  4.35  2.75
```
  
---
name: bring_to_front
# Bring columns to front

>use `everything()` to bring some columns to the front:

``` r
bijou %>% select(x:z, everything()) %>% head(n = 4)
```
  
--

```
## # A tibble: 4 × 10
##       x     y     z carat cut     color clarity depth table price
##   <dbl> <dbl> <dbl> <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int>
## 1  3.95  3.98  2.43  0.23 Ideal   E     SI2      61.5    55   326
## 2  3.89  3.84  2.31  0.21 Premium E     SI1      59.8    61   326
## 3  4.05  4.07  2.31  0.23 Good    E     VS1      56.9    65   327
## 4  4.2   4.23  2.63  0.29 Premium I     VS2      62.4    58   334
```

---
name: mutate

# Create/alter new Variables with `mutate`

``` r
bijou %>% mutate(p = x + z, q = p + y) %>% 
  select(-(depth:price)) %>% 
  head(n = 5)
```

```
## # A tibble: 5 × 9
##   carat cut     color clarity     x     y     z     p     q
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  0.23 Ideal   E     SI2      3.95  3.98  2.43  6.38  10.4
## 2  0.21 Premium E     SI1      3.89  3.84  2.31  6.2   10.0
## 3  0.23 Good    E     VS1      4.05  4.07  2.31  6.36  10.4
## 4  0.29 Premium I     VS2      4.2   4.23  2.63  6.83  11.1
## 5  0.31 Good    J     SI2      4.34  4.35  2.75  7.09  11.4
```
  
  
--

``` r
bijou %>% mutate(p = x + z, q = p + y) %>% 
  select(-(depth:price)) %>% 
  head(n = 5)
```

---
name: transmute
# Create/alter new Variables with `transmute` 🧙‍♂️

>Only the transformed variables will be retained.

``` r
bijou %>% transmute(carat, cut, sum = x + y + z) %>% head(n = 5)
```

```
## # A tibble: 5 × 3
##   carat cut       sum
##   <dbl> <ord>   <dbl>
## 1  0.23 Ideal    10.4
## 2  0.21 Premium  10.0
## 3  0.23 Good     10.4
## 4  0.29 Premium  11.1
## 5  0.31 Good     11.4
```

---
name: grouped_summaries
# Group and Summarize

``` r
bijou %>% group_by(cut) %>% summarize(max_price = max(price),
                                      mean_price = mean(price),
                                      min_price = min(price))
```

```
## # A tibble: 4 × 4
##   cut       max_price mean_price min_price
##   <ord>         <int>      <dbl>     <int>
## 1 Good            335        331       327
## 2 Very Good       336        336       336
## 3 Premium         334        330       326
## 4 Ideal           326        326       326
```

``` r
bijou %>% group_by(cut, color) %>%
  summarize(max_price = max(price),
            mean_price = mean(price),
            min_price = min(price)) %>% head(n = 4)
```

```
## # A tibble: 4 × 5
## # Groups:   cut [3]
##   cut       color max_price mean_price min_price
##   <ord>     <ord>     <int>      <dbl>     <int>
## 1 Good      E           327        327       327
## 2 Good      J           335        335       335
## 3 Very Good J           336        336       336
## 4 Premium   E           326        326       326
```

---
name: other_data_manipulations

# Other data manipulation tips

``` r
bijou %>% group_by(cut) %>% summarize(count = n())
```

```
## # A tibble: 4 × 2
##   cut       count
##   <ord>     <int>
## 1 Good          2
## 2 Very Good     1
## 3 Premium       2
## 4 Ideal         1
```

When you need to regroup within the same pipe, use `ungroup()`.

---
name: concept_of_tidy_data

# The Concept of Tidy Data

Data are tidy *sensu Wickham* if:
* each and every observation is represented as exactly one row,
* each and every variable is represented by exactly one column,
* thus each data table cell contains only one value.
<img src="data/slide_tidyverse/tidy_data.png" width="2560" style="display: block; margin: auto auto auto 0;" />

Usually data are untidy in only one way. However, if you are unlucky, they are really untidy and thus a pain to work with...

---
name: tidy_data

# Tidy Data

.center[**Are these data tidy?**]

.pull-left-70[
<table class="table table-striped table-hover table-responsive table-condensed" style="">
 <thead>
  <tr>
   <th style="text-align:center;"> Sepal.Length </th>
   <th style="text-align:center;"> Sepal.Width </th>
   <th style="text-align:center;"> Petal.Length </th>
   <th style="text-align:center;"> Petal.Width </th>
   <th style="text-align:center;"> Species </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 5.1 </td>
   <td style="text-align:center;"> 3.5 </td>
   <td style="text-align:center;"> 1.4 </td>
   <td style="text-align:center;"> 0.2 </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4.9 </td>
   <td style="text-align:center;"> 3.0 </td>
   <td style="text-align:center;"> 1.4 </td>
   <td style="text-align:center;"> 0.2 </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4.7 </td>
   <td style="text-align:center;"> 3.2 </td>
   <td style="text-align:center;"> 1.3 </td>
   <td style="text-align:center;"> 0.2 </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
</tbody>
</table>
]

.pull-right-30[

<table class="table table-striped table-hover table-responsive table-condensed" style="">
 <thead>
  <tr>
   <th style="text-align:center;"> Species </th>
   <th style="text-align:center;"> variable </th>
   <th style="text-align:center;"> value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> setosa </td>
   <td style="text-align:center;"> Sepal.Length </td>
   <td style="text-align:center;"> 5.1 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> setosa </td>
   <td style="text-align:center;"> Sepal.Length </td>
   <td style="text-align:center;"> 4.9 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> setosa </td>
   <td style="text-align:center;"> Sepal.Length </td>
   <td style="text-align:center;"> 4.7 </td>
  </tr>
</tbody>
</table>

]

--
.pull-left-50[

<table class="table table-striped table-hover table-responsive table-condensed" style="">
 <thead>
  <tr>
   <th style="text-align:center;"> Sepal.L.W </th>
   <th style="text-align:center;"> Petal.L.W </th>
   <th style="text-align:center;"> Species </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 5.1/3.5 </td>
   <td style="text-align:center;"> 1.4/0.2 </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4.9/3 </td>
   <td style="text-align:center;"> 1.4/0.2 </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4.7/3.2 </td>
   <td style="text-align:center;"> 1.3/0.2 </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
</tbody>
</table>

]

.pull-right-50[
<table class="table table-striped table-hover table-responsive table-condensed" style="">
<tbody>
  <tr>
   <td style="text-align:left;"> Sepal.Length </td>
   <td style="text-align:center;"> 5.1 </td>
   <td style="text-align:center;"> 4.9 </td>
   <td style="text-align:center;"> 4.7 </td>
   <td style="text-align:center;"> 4.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sepal.Width </td>
   <td style="text-align:center;"> 3.5 </td>
   <td style="text-align:center;"> 3.0 </td>
   <td style="text-align:center;"> 3.2 </td>
   <td style="text-align:center;"> 3.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Petal.Length </td>
   <td style="text-align:center;"> 1.4 </td>
   <td style="text-align:center;"> 1.4 </td>
   <td style="text-align:center;"> 1.3 </td>
   <td style="text-align:center;"> 1.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Petal.Width </td>
   <td style="text-align:center;"> 0.2 </td>
   <td style="text-align:center;"> 0.2 </td>
   <td style="text-align:center;"> 0.2 </td>
   <td style="text-align:center;"> 0.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Species </td>
   <td style="text-align:center;"> setosa </td>
   <td style="text-align:center;"> setosa </td>
   <td style="text-align:center;"> setosa </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
</tbody>
</table>

]

---
name: tidying_data_pivot_longer

# Tidying Data with `tidyr::pivot_longer`

If some of your column names are actually values of a variable, use `pivot_longer` (replaces `gather`):

``` r
bijou2 %>% head(n = 5)
```

```
## # A tibble: 5 × 3
##   cut     `2008` `2009`
##   <ord>    <int>  <dbl>
## 1 Ideal      326   330.
## 2 Premium    326   330.
## 3 Good       327   331.
## 4 Premium    334   338.
## 5 Good       335   339.
```

``` r
bijou2 %>%
  pivot_longer(c(`2008`, `2009`), names_to = 'year', values_to = 'price') %>%
  head(n = 5)
```

```
## # A tibble: 5 × 3
##   cut     year  price
##   <ord>   <chr> <dbl>
## 1 Ideal   2008   326 
## 2 Ideal   2009   330.
## 3 Premium 2008   326 
## 4 Premium 2009   330.
## 5 Good    2008   327
```

---
name: tidying_data_pivot_wider

# Tidying Data with `tidyr::pivot_wider`

If some of your observations are scattered across many rows, use `pivot_wider` (replaces `gather`):

``` r
bijou3
```

```
## # A tibble: 9 × 5
##   cut     price clarity dimension measurement
##   <ord>   <int> <ord>   <chr>           <dbl>
## 1 Ideal     326 SI2     x                3.95
## 2 Premium   326 SI1     x                3.89
## 3 Good      327 VS1     x                4.05
## 4 Ideal     326 SI2     y                3.98
## 5 Premium   326 SI1     y                3.84
## 6 Good      327 VS1     y                4.07
## 7 Ideal     326 SI2     z                2.43
## 8 Premium   326 SI1     z                2.31
## 9 Good      327 VS1     z                2.31
```

``` r
bijou3 %>%
  pivot_wider(names_from=dimension, values_from=measurement) %>%
  head(n = 4)
```

```
## # A tibble: 3 × 6
##   cut     price clarity     x     y     z
##   <ord>   <int> <ord>   <dbl> <dbl> <dbl>
## 1 Ideal     326 SI2      3.95  3.98  2.43
## 2 Premium   326 SI1      3.89  3.84  2.31
## 3 Good      327 VS1      4.05  4.07  2.31
```

---
name: tidying_data_separate

# Tidying Data with `separate`

If some of your columns contain more than one value, use `separate`:

``` r
bijou4
```

```
## # A tibble: 5 × 4
##   cut     price clarity dim           
##   <ord>   <int> <ord>   <chr>         
## 1 Ideal     326 SI2     3.95/3.98/2.43
## 2 Premium   326 SI1     3.89/3.84/2.31
## 3 Good      327 VS1     4.05/4.07/2.31
## 4 Premium   334 VS2     4.2/4.23/2.63 
## 5 Good      335 SI2     4.34/4.35/2.75
```

``` r
bijou4 %>%
  separate(dim, into = c("x", "y", "z"), sep = "/", convert = T)
```

```
## # A tibble: 5 × 6
##   cut     price clarity     x     y     z
##   <ord>   <int> <ord>   <dbl> <dbl> <dbl>
## 1 Ideal     326 SI2      3.95  3.98  2.43
## 2 Premium   326 SI1      3.89  3.84  2.31
## 3 Good      327 VS1      4.05  4.07  2.31
## 4 Premium   334 VS2      4.2   4.23  2.63
## 5 Good      335 SI2      4.34  4.35  2.75
```

---
name: tidying_data_unite
# Tidying Data with `unite`

If some of your columns contain more than one value, use `separate`:

``` r
bijou5
```

```
## # A tibble: 5 × 7
##   cut     price clarity_prefix clarity_suffix     x     y     z
##   <ord>   <int> <chr>          <chr>          <dbl> <dbl> <dbl>
## 1 Ideal     326 SI             2               3.95  3.98  2.43
## 2 Premium   326 SI             1               3.89  3.84  2.31
## 3 Good      327 VS             1               4.05  4.07  2.31
## 4 Premium   334 VS             2               4.2   4.23  2.63
## 5 Good      335 SI             2               4.34  4.35  2.75
```

``` r
bijou5 %>% unite(clarity, clarity_prefix, clarity_suffix, sep='')
```

```
## # A tibble: 5 × 6
##   cut     price clarity     x     y     z
##   <ord>   <int> <chr>   <dbl> <dbl> <dbl>
## 1 Ideal     326 SI2      3.95  3.98  2.43
## 2 Premium   326 SI1      3.89  3.84  2.31
## 3 Good      327 VS1      4.05  4.07  2.31
## 4 Premium   334 VS2      4.2   4.23  2.63
## 5 Good      335 SI2      4.34  4.35  2.75
```

---
name: missing_complete

# Completing Missing Values Using `complete`

``` r
bijou %>% head(n = 10) %>%
  select(cut, clarity, price) %>%
  mutate(continent = sample(c('AusOce', 'Eur'),
                            size = 6,
                            replace = T)) -> missing_stones
```

``` r
missing_stones %>% complete(cut, continent)
```

```
## # A tibble: 12 × 4
##    cut       continent clarity price
##    <ord>     <chr>     <ord>   <int>
##  1 Fair      AusOce    <NA>       NA
##  2 Fair      Eur       <NA>       NA
##  3 Good      AusOce    <NA>       NA
##  4 Good      Eur       VS1       327
##  5 Good      Eur       SI2       335
##  6 Very Good AusOce    <NA>       NA
##  7 Very Good Eur       VVS2      336
##  8 Premium   AusOce    <NA>       NA
##  9 Premium   Eur       SI1       326
## 10 Premium   Eur       VS2       334
## 11 Ideal     AusOce    SI2       326
## 12 Ideal     Eur       <NA>       NA
```

---
name: joins

# Joining Data with `_join`

.pull-left-50[
    
    ```
    ## # A tibble: 5 × 2
    ##     key value1
    ##   <dbl> <chr> 
    ## 1     1 a     
    ## 2     2 b     
    ## 3     3 c     
    ## 4     4 d     
    ## 5     5 e
    ```
]

.pull-right-50[
    
    ```
    ## # A tibble: 5 × 2
    ##     key value2
    ##   <dbl> <chr> 
    ## 1     1 A     
    ## 2     2 B     
    ## 3     3 C     
    ## 4     6 F     
    ## 5     7 G
    ```
]

**Example:**

``` r
inner_join(tibble1, tibble2, by = 'key')
```

```
## # A tibble: 3 × 3
##     key value1 value2
##   <dbl> <chr>  <chr> 
## 1     1 a      A     
## 2     2 b      B     
## 3     3 c      C
```
    
`[inner, left, right, full]_join` are available. Try these!

---
name: more_tidyverse

# Some Other Friends

* `stringr` for string manipulation and regular expressions,
* `forcats` for working with factors,
* `lubridate` for working with dates.

---
name: end-slide
class: end-slide

# Thank you. Questions? [More?](https://nbisweden.github.io/raukr-2024/)

.end-text[
<p class="smaller">
<span class="small" style="line-height: 1.2;">Graphics from </span><img src="./assets/freepik.jpg" style="max-height:20px; vertical-align:middle;"><br>
Created: 31-Oct-2024 • <a href="https://www.scilifelab.se/">SciLifeLab</a> • <a href="https://nbis.se/">NBIS</a> 
</p>
]