Tidy Work in Tidyverse

class: center, middle, inverse, title-slide

# Tidy Work in Tidyverse
## Advanced R for Bioinformatics. Visby, 2018.
### Marcin Kierczak
### 04 June, 2018

---

class: spaced

## Tidyverse -- What is it all About?

* [Tidyverse](http://www.tidyverse.org) is a collection of packages. 
* Created by [Hadley Wickham](http://hadley.nz).
* Gains popularity, on the way to become a *de facto* standard in data analyses.
* Knowing how to use it can increase your salary :-)
* A philosophy of programming or a programing paradigm.
* Everything is about the flow of *tidy data*.
.center[
<img src="assets/hex-tidyverse.png", style="height:200px;">
<img src="assets/Hadley-wickham2016-02-04.jpeg", style="height:200px;">
<img src="assets/RforDataScience.jpeg", style="height:200px;">
]
.vsmall[sources of images: www.tidyverse.org, Wikipedia, www.tidyverse.org]

---
name: tidyverse_workflow

## Typical Tidyverse Workflow
The tidyverse curse?<br><br>
--
*Navigating the balance between base R and the tidyverse is a challenge to learn.*
.right[.small[-- [Robert A. Muenchen](http://r4stats.com/articles/why-r-is-hard-to-learn/)]]
<br><br>
--
<img src="assets/tidyverse-flow.png", style="height:400px;">
.vsmall[source: http://www.storybench.org/getting-started-with-tidyverse-in-r/]

---
name: intro_to_pipes
## Introduction to Pipes
.pull-left-50[
  .center[
    <img src="assets/MagrittePipe.jpg" width="300" style="display: block; margin: auto auto auto 0;" />
  ]
  .vsmall[
    Rene Magritt, *La trahison des images*, [Wikimedia Commons](https://en.wikipedia.org/wiki/The_Treachery_of_Images#/media/File:MagrittePipe.jpg)
  ]
  <br>&nbsp;
  .center[
    <img src="assets/magrittr.png" width="150" style="display: block; margin: auto auto auto 0;" />
  ]
]
--
.pull-right-50[
* Let the data flow.
* *Ceci n'est pas une pipe* -- `magrittr`
* The `%>%` pipe:
  + `x %>% f` `$\equiv$` `f(x)`
  + `x %>% f(y)` `$\equiv$` `f(x, y)`
  + `x %>% f %>% g %>% h` `$\equiv$` `h(g(f(x)))`
]
--
.pull-right-50[
instead of writing this:

```r
data <- iris
data <- head(data, n=3)
```
]
--
.pull-right-50[
write this:

```r
iris %>% head(n=3)
```

```
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
```
]

---
name: other_pipes_T
## Other Types of `magrittr` Pipes -- `%T>%`

.pull-left-50[
The %T>% pipe is useful when you call a function for its side effects:

```r
rnorm(50) %>%
  matrix(ncol = 2) %>%
  plot() %>%
  summary()
```

```
## Length  Class   Mode 
##      0   NULL   NULL
```

<img src="tidyverse_presentation_files/figure-html/magrittr2a-1.png" style="display: block; margin: auto auto auto 0;" />
]
--
.pull-right-50[

```r
rnorm(50) %>%
  matrix(ncol = 2) %T>%
  plot() %>%
  summary()
```

```
##        V1                  V2          
##  Min.   :-1.886251   Min.   :-3.01041  
##  1st Qu.:-0.759696   1st Qu.:-0.69188  
##  Median : 0.008827   Median :-0.03033  
##  Mean   :-0.279981   Mean   : 0.16094  
##  3rd Qu.: 0.143669   3rd Qu.: 1.03628  
##  Max.   : 1.461389   Max.   : 2.61304
```

<img src="tidyverse_presentation_files/figure-html/magrittr2b-1.png" style="display: block; margin: auto auto auto 0;" />
]

---
name: the_splitting_pipe

## Other Types of `magrittr` Pipes -- `%$%`

```r
iris %>% cor(Sepal.Length, Sepal.Width)
```

```
## Error in pmatch(use, c("all.obs", "complete.obs", "pairwise.complete.obs", : object 'Sepal.Width' not found
```

We need the `%$%` pipe with exposition of variables:

```r
iris %$% cor(Sepal.Length, Sepal.Width)
```

```
## [1] -0.1175698
```
This is because the `cor` function does not have the `data` argument (which also should be the first argument of a pipe-friendly function).

### The %<>% Pipe
It exists but can lead to somewhat confusing code.  
`x %<>% f` `$\equiv$` `x <- f(x)`

```r
M <- matrix(rnorm(16), nrow=4)
M %<>% colSums()
M
```

```
## [1] -2.3928896 -0.4712869 -1.2343871 -0.4547427
```
---
name: magrittr_placeholder

## Placeholders in `magrittr` Pipes
Sometimes we want to pass the resulting data to *other than the first* argument of the next function in chain. `magritter` provides placeholder mechanism for this:
* `x %>% f(y, .)` `$\equiv$` `f(y, x)`,
* `x %>% f(y, z = .)` `$\equiv$` `f(y, z = x)`.

But for nested expressions:
* `x %>% f(a = p(.), b = q(.))` `$\equiv$` `f(x, a = p(x), b = q(x))`,
* `x %>% {f(a = p(.), b = q(.))}` `$\equiv$` `f(a = p(x), b = q(x))`.

Examples:

```r
M <- rnorm(4) %>% matrix(nrow = 2)
M %>% `%*%`(., .)
```

```
##           [,1]       [,2]
## [1,] 1.3229968 -0.4043394
## [2,] 0.3027784  0.2510555
```

```r
print_M_summ <- function(nrow, ncol) {
  paste0('Matrix M has: ', nrow, ' rows and ', ncol, ' columns.')
}
M %>% {print_M_summ(nrow(.), ncol(.))}
```

```
## [1] "Matrix M has: 2 rows and 2 columns."
```

---
name: tibble_intro

## Tibbles
.pull-left-50[
  <img src="assets/hex-tibble.png" width="160" style="display: block; margin: auto;" />
  
  ```r
  as.tibble(iris)
  ```
  
  ```
  ## # A tibble: 150 x 5
  ##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
  ##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
  ##  1          5.1         3.5          1.4         0.2 setosa 
  ##  2          4.9         3            1.4         0.2 setosa 
  ##  3          4.7         3.2          1.3         0.2 setosa 
  ##  4          4.6         3.1          1.5         0.2 setosa 
  ##  5          5           3.6          1.4         0.2 setosa 
  ##  6          5.4         3.9          1.7         0.4 setosa 
  ##  7          4.6         3.4          1.4         0.3 setosa 
  ##  8          5           3.4          1.5         0.2 setosa 
  ##  9          4.4         2.9          1.4         0.2 setosa 
  ## 10          4.9         3.1          1.5         0.1 setosa 
  ## # ... with 140 more rows
  ```
]

.pull-right-50[
* `tibble` is one of the unifying features of tidyverse,
* it is a *better* `data.frame` realization,
* objects `data.frame` can be coerced to `tibble` using `as.tibble()`

```r
  tibble(
    x = 1,          # recycling
    y = runif(50), 
    z = x + y^2,
    outcome = rnorm(50)
  )
```

```
## # A tibble: 50 x 4
##        x      y     z outcome
##    <dbl>  <dbl> <dbl>   <dbl>
##  1     1 0.579   1.34 -0.705 
##  2     1 0.381   1.15 -1.50  
##  3     1 0.575   1.33 -0.185 
##  4     1 0.0603  1.00  0.795 
##  5     1 0.763   1.58  1.11  
##  6     1 0.759   1.58 -0.989 
##  7     1 0.732   1.54 -0.0796
##  8     1 0.231   1.05 -0.611 
##  9     1 0.958   1.92  0.626 
## 10     1 0.640   1.41  0.622 
## # ... with 40 more rows
```
]

---
name: tibble2

## More on Tibbles

* When you print a `tibble`:
  + all columns that fit the screen are shown,
  + first 10 rows are shown,
  + data type for each column is shown.

```r
as.tibble(cars)
```

```
## # A tibble: 50 x 2
##    speed  dist
##    <dbl> <dbl>
##  1     4     2
##  2     4    10
##  3     7     4
##  4     7    22
##  5     8    16
##  6     9    10
##  7    10    18
##  8    10    26
##  9    10    34
## 10    11    17
## # ... with 40 more rows
```

* `my_tibble %>% print(n = 50, width = Inf)`,
* `options(tibble.print_min = 15, tibble.print_max = 25)`,
* `options(dplyr.print_min = Inf)`,
* `options(tibble.width = Inf)`

---
name: tibble2

## Subsetting Tibbles

```r
vehicles <- as.tibble(cars[1:5,])

vehicles[['speed']]
vehicles[[1]]
vehicles$speed

# Using placeholders

vehicles %>% .$dist
vehicles %>% .[['dist']]
vehicles %>% .[[2]]
```

```
## [1] 4 4 7 7 8
## [1] 4 4 7 7 8
## [1] 4 4 7 7 8
## [1]  2 10  4 22 16
## [1]  2 10  4 22 16
## [1]  2 10  4 22 16
```
--
**Note!** Not all old R functions work with tibbles, than you have to use `as.data.frame(my_tibble)`.

---
name: tibbles_partial_matching

## Tibbles are Stricter than `data.frames`

```r
cars$spe      # partial matching
```

```
## [1] 4 4 7 7 8
```

```r
vehicles$spe  # no partial matching
```

```
## Warning: Unknown or uninitialised column: 'spe'.
```

```
## NULL
```

```r
cars$gear
```

```
## NULL
```

```r
vehicles$gear
```

```
## Warning: Unknown or uninitialised column: 'gear'.
```

```
## NULL
```

---
name: loading_data

## Loading Data
In `tidyverse` you import data using `readr` package that provides a number of useful data import functions:
* `read_delim()` a generic function for reading *-delimited files. There are a number of convenience wrappers:
  + `read_csv()` used to read comma-delimited files,
  + `read_csv2()` reads semicolon-delimited files, 
  `read_tsv()` that reads tab-delimited files.
* `read_fwf` for reading fixed-width files with its wrappers:
  + fwf_widths() for width-based reading,
  + fwf_positions() for positions-based reading and
  + read_table() for reading white space-delimited fixed-width files.
* `read_log()` for reading Apache-style logs.

The most commonly used `read_csv()` has some familiar arguments like:
* `skip` -- to specify the number of rows to skip (headers),
* `col_names` -- to supply a vector of column names,
* `comment` -- to specify what character designates a comment,
* `na` -- to specify how missing values are represented.

---
name: parse_functions

## Under the Hood -- `parse_*` Functions 
Under the hood, data-reading functions use `parse_*` functions:

```r
parse_double("42.24")
```

```
## [1] 42.24
```

```r
parse_number("272'555'849,55", 
             locale = locale(decimal_mark = ",", 
                             grouping_mark = "'"
                            )
             )
```

```
## [1] 272555850
```

```r
parse_number(c('100%', 'price: 500$', '21sek', '42F'))
```

```
## [1] 100 500  21  42
```

---
name: parsing_strings

## Parsing Strings

* Strings can be represented in different encodings:

```r
text1 <- 'P?? en ?? ??r en ??'
text2 <- 'Za???????? g????l?? ja????'
```

```r
text1
charToRaw(text2)
parse_character(charToRaw(text1), locale = locale(encoding = 'UTF-8'))
guess_encoding(charToRaw("Test"))
```

```
## [1] "P?? en ?? ??r en ??"
##  [1] 5a 61 3f 3f 3f 3f 3f 3f 3f 3f 20 67 3f 3f 3f 3f 6c 3f 3f 20 6a 61 3f
## [24] 3f 3f 3f
##  [1] "50" "3f" "3f" "20" "65" "6e" "20" "3f" "3f" "20" "3f" "3f" "72" "20"
## [15] "65" "6e" "20" "3f" "3f"
## # A tibble: 1 x 2
##   encoding confidence
##   <chr>         <dbl>
## 1 ASCII             1
```

---
name: parsing_factors

## Parsing Factors

* R is using factors to represent cathegorical variables. 
* Supply known levels to `parse_factor` so that it warns you when an unknown level is present in the data:

```r
landscapes <- c('mountains', 'swamps', 'seaside')
parse_factor(c('mountains', 'plains', 'seaside', 'swamps'), 
             levels = landscapes)
```

```
## Warning: 1 parsing failure.
## row # A tibble: 1 x 4 col     row   col expected           actual expected   <int> <int> <chr>              <chr>  actual 1     2    NA value in level set plains
```

```
## [1] mountains <NA>      seaside   swamps   
## attr(,"problems")
## # A tibble: 1 x 4
##     row   col expected           actual
##   <int> <int> <chr>              <chr> 
## 1     2    NA value in level set plains
## Levels: mountains swamps seaside
```
---
name: parsing_other_functions

## Other Parsing Functions

`parse_`
* `vector`, `time`, `number`, `logical`, `integer`, `double`, `character`, `date`, `datetime`,
* `guess`

```r
guess_parser("2018-06-11 09:00:00")
parse_guess("2018-06-11 09:00:00")

guess_parser(c(1, 2.3, "23$", "54%"))
parse_guess(c(1, 2.3, "23$", "54%"))
```

```
## [1] "datetime"
## [1] "2018-06-11 09:00:00 UTC"
## [1] "character"
## [1] "1"   "2.3" "23$" "54%"
```

---
name: readr

## Importing Data Using `readr`

When reading and parsing a file, `readr` attempts to guess proper parser for each column by looking at the 1000 first rows.

```r
tricky_dataset <- read_csv(readr_example('challenge.csv'))
```

```
## Parsed with column specification:
## cols(
##   x = col_integer(),
##   y = col_character()
## )
```

```
## Warning in rbind(names(probs), probs_f): number of columns of result is not
## a multiple of vector length (arg 1)
```

```
## Warning: 1000 parsing failures.
## row # A tibble: 5 x 5 col     row col   expected               actual             file               expected   <int> <chr> <chr>                  <chr>              <chr>              actual 1  1001 x     no trailing characters .23837975086644292 '/usr/local/lib/R~ file 2  1002 x     no trailing characters .41167997173033655 '/usr/local/lib/R~ row 3  1003 x     no trailing characters .7460716762579978  '/usr/local/lib/R~ col 4  1004 x     no trailing characters .723450553836301   '/usr/local/lib/R~ expected 5  1005 x     no trailing characters .614524137461558   '/usr/local/lib/R~
## ... ................. ... .......................................................................... ........ .......................................................................... ...... .......................................................................... .... .......................................................................... ... .......................................................................... ... .......................................................................... ........ ..........................................................................
## See problems(...) for more details.
```
OK, so there are some parsing failures. We can examine them more closely using `problems()` as suggested in the above output.

---
name: readr_problems
## Looking at Problematic Columns

```r
p <- problems(tricky_dataset)
p
```

```
## # A tibble: 1,000 x 5
##      row col   expected               actual             file             
##    <int> <chr> <chr>                  <chr>              <chr>            
##  1  1001 x     no trailing characters .23837975086644292 '/usr/local/lib/~
##  2  1002 x     no trailing characters .41167997173033655 '/usr/local/lib/~
##  3  1003 x     no trailing characters .7460716762579978  '/usr/local/lib/~
##  4  1004 x     no trailing characters .723450553836301   '/usr/local/lib/~
##  5  1005 x     no trailing characters .614524137461558   '/usr/local/lib/~
##  6  1006 x     no trailing characters .473980569280684   '/usr/local/lib/~
##  7  1007 x     no trailing characters .5784610391128808  '/usr/local/lib/~
##  8  1008 x     no trailing characters .2415937229525298  '/usr/local/lib/~
##  9  1009 x     no trailing characters .11437866208143532 '/usr/local/lib/~
## 10  1010 x     no trailing characters .2983446326106787  '/usr/local/lib/~
## # ... with 990 more rows
```
OK, let's see which columns cause trouble:

```r
p %$% table(col)
```

```
## col
##    x 
## 1000
```
Looks like the problem occurs only in the `x` column.

---
name: readr_problems_fixing
## Fixing Problematic Columns
So, how can we fix the problematic columns?

1. We can explicitely tell what parser to use:

```r
tricky_dataset <- read_csv(readr_example('challenge.csv'),
                           col_types = cols(x = col_double(),
                                            y = col_character()
                                            )
                                                      )
tricky_dataset %>% tail(n = 5)
```

```
## # A tibble: 5 x 2
##       x y         
##   <dbl> <chr>     
## 1 0.164 2018-03-29
## 2 0.472 2014-08-04
## 3 0.718 2015-08-16
## 4 0.270 2020-02-04
## 5 0.608 2019-01-06
```
As you can see, we can still do better by parsing the `y` column as *date*, not as *character*.

---
name: readr_problems_fixing2
## Fixing Problematic Columns cted.
But knowing that the parser is guessed based on the first 1000 lines, we can see what sits past the 1000-th line in the data:

```r
tricky_dataset %>% head(n = 1002) %>% tail(n = 4)
```

```
## # A tibble: 4 x 2
##          x y         
##      <dbl> <chr>     
## 1 4569     <NA>      
## 2 4548     <NA>      
## 3    0.238 2015-01-16
## 4    0.412 2018-05-18
```
It seems, we were very unlucky, because up till 1000-th line there are only integers in the x column and `NA`s in the y column so the parser cannot be guessed correctly. To fix this:

```r
tricky_dataset <- read_csv(readr_example('challenge.csv'),
                           guess_max = 1001)
```

```
## Parsed with column specification:
## cols(
##   x = col_double(),
##   y = col_date(format = "")
## )
```

---
name: readr_writing
## Writing to a File
The `readr` package also provides functions useful for writing tibbled data into a file:

* `write_csv()`
* `write_tsv()`
* `write_excel_csv()`

They **always** save:

* text in UTF-8,
* dates in ISO8601

But saving in csv (or tsv) does mean you loose information about the type of data in particular columns. You can avoid this by using:

* `write_rds()` and `read_rds()` to read/write objects in R binary rds format,
* use `write_feather()` and `read_feather()` from package `feather` to read/write objects in a fast binary format that other programming languages can access.

---
name: basic_data_transformations
## Basic Data Transformations with `dplyr`

Let us create a tibble:

```r
bijou <- as.tibble(diamonds) %>% head(n = 100)
bijou
```

```
## # A tibble: 100 x 10
##    carat cut       color clarity depth table price     x     y     z
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
##  2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
##  3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
##  4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
##  5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
##  6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
##  7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
##  8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
##  9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
## 10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39
## # ... with 90 more rows
```

.center[
  <img src="assets/diamonds.png", style="height:200px">
]
---
name: filter
## Picking Observations using `filter()`

```r
bijou %>% filter(cut == 'Ideal' | cut == 'Premium', carat >= 0.23) %>%
  head(n = 5)
```

```
## # A tibble: 5 x 10
##   carat cut     color clarity depth table price     x     y     z
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23  Ideal   E     SI2      61.5    55   326  3.95  3.98  2.43
## 2 0.290 Premium I     VS2      62.4    58   334  4.2   4.23  2.63
## 3 0.23  Ideal   J     VS1      62.8    56   340  3.93  3.9   2.46
## 4 0.31  Ideal   J     SI2      62.2    54   344  4.35  4.37  2.71
## 5 0.32  Premium E     I1       60.9    58   345  4.38  4.42  2.68
```
Be careful with floating point comparisons! Also, rows with comparison resulting in `NA` are skipped by default!

```r
bijou %>% filter(near(0.23, carat) | is.na(carat)) %>%
  head(n = 5)
```

```
## # A tibble: 5 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 3  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
## 4  0.23 Ideal     J     VS1      62.8    56   340  3.93  3.9   2.46
## 5  0.23 Very Good E     VS2      63.8    55   352  3.85  3.92  2.48
```

---
name: arrange
## Rearranging Observations using `arrange()`

```r
bijou %>% arrange(cut, carat, desc(price))
```

```
## # A tibble: 100 x 10
##    carat cut   color clarity depth table price     x     y     z
##    <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1  0.22 Fair  E     VS2      65.1    61   337  3.87  3.78  2.49
##  2  0.86 Fair  E     SI2      55.1    69  2757  6.45  6.33  3.52
##  3  0.96 Fair  F     SI2      66.3    62  2759  6.27  5.95  4.07
##  4  0.23 Good  F     VS1      58.2    59   402  4.06  4.08  2.37
##  5  0.23 Good  E     VS1      64.1    59   402  3.83  3.85  2.46
##  6  0.23 Good  E     VS1      56.9    65   327  4.05  4.07  2.31
##  7  0.26 Good  E     VVS1     57.9    60   554  4.22  4.25  2.45
##  8  0.26 Good  D     VS2      65.2    56   403  3.99  4.02  2.61
##  9  0.26 Good  D     VS1      58.4    63   403  4.19  4.24  2.46
## 10  0.3  Good  H     SI1      63.7    57   554  4.28  4.26  2.72
## # ... with 90 more rows
```
The `NA`s always end up at the end of the rearranged tibble.

---
name: select
## Selecting Variables with `select()`
Simple `select` with a range:

```r
bijou %>% select(color, clarity, x:z) %>% head(n = 5)
```

```
## # A tibble: 5 x 5
##   color clarity     x     y     z
##   <ord> <ord>   <dbl> <dbl> <dbl>
## 1 E     SI2      3.95  3.98  2.43
## 2 E     SI1      3.89  3.84  2.31
## 3 E     VS1      4.05  4.07  2.31
## 4 I     VS2      4.2   4.23  2.63
## 5 J     SI2      4.34  4.35  2.75
```
--
Exclusive `select`:

```r
bijou %>% select(-(x:z)) %>% head(n = 5)
```

```
## # A tibble: 5 x 7
##   carat cut     color clarity depth table price
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int>
## 1 0.23  Ideal   E     SI2      61.5    55   326
## 2 0.21  Premium E     SI1      59.8    61   326
## 3 0.23  Good    E     VS1      56.9    65   327
## 4 0.290 Premium I     VS2      62.4    58   334
## 5 0.31  Good    J     SI2      63.3    58   335
```

---
name: select2
## Selecting Variables with `select()` cted.
`rename` is a variant of `select`, here used with `everything()` to move `x` to the beginning and rename it to `var_x`

```r
bijou %>% rename(var_x = x) %>% head(n = 5)
```

```
## # A tibble: 5 x 10
##   carat cut     color clarity depth table price var_x     y     z
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23  Ideal   E     SI2      61.5    55   326  3.95  3.98  2.43
## 2 0.21  Premium E     SI1      59.8    61   326  3.89  3.84  2.31
## 3 0.23  Good    E     VS1      56.9    65   327  4.05  4.07  2.31
## 4 0.290 Premium I     VS2      62.4    58   334  4.2   4.23  2.63
## 5 0.31  Good    J     SI2      63.3    58   335  4.34  4.35  2.75
```
--
use `everything()` to bring some columns to the front:

```r
bijou %>% select(x:z, everything()) %>% head(n = 5)
```

```
## # A tibble: 5 x 10
##       x     y     z carat cut     color clarity depth table price
##   <dbl> <dbl> <dbl> <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int>
## 1  3.95  3.98  2.43 0.23  Ideal   E     SI2      61.5    55   326
## 2  3.89  3.84  2.31 0.21  Premium E     SI1      59.8    61   326
## 3  4.05  4.07  2.31 0.23  Good    E     VS1      56.9    65   327
## 4  4.2   4.23  2.63 0.290 Premium I     VS2      62.4    58   334
## 5  4.34  4.35  2.75 0.31  Good    J     SI2      63.3    58   335
```

---
name: mutate
## Create/alter new Variables with `mutate`

```r
bijou %>% mutate(p = x + z, q = p + y) %>% select(-(depth:price)) %>% head(n = 5)
```

```
## # A tibble: 5 x 9
##   carat cut     color clarity     x     y     z     p     q
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.23  Ideal   E     SI2      3.95  3.98  2.43  6.38  10.4
## 2 0.21  Premium E     SI1      3.89  3.84  2.31  6.2   10.0
## 3 0.23  Good    E     VS1      4.05  4.07  2.31  6.36  10.4
## 4 0.290 Premium I     VS2      4.2   4.23  2.63  6.83  11.1
## 5 0.31  Good    J     SI2      4.34  4.35  2.75  7.09  11.4
```
--
or with `transmute` (only the transformed variables will be retained)

```r
bijou %>% transmute(carat, cut, sum = x + y + z) %>% head(n = 5)
```

```
## # A tibble: 5 x 3
##   carat cut       sum
##   <dbl> <ord>   <dbl>
## 1 0.23  Ideal    10.4
## 2 0.21  Premium  10.0
## 3 0.23  Good     10.4
## 4 0.290 Premium  11.1
## 5 0.31  Good     11.4
```

---
name: grouped_summaries
## Group and Summarize

```r
bijou %>% group_by(cut) %>% summarize(max_price = max(price),
                                      mean_price = mean(price),
                                      min_price = min(price))
```

```
## # A tibble: 5 x 4
##   cut       max_price mean_price min_price
##   <ord>         <dbl>      <dbl>     <dbl>
## 1 Fair           2759      1951        337
## 2 Good           2759       661.       327
## 3 Very Good      2760       610.       336
## 4 Premium        2760       569.       326
## 5 Ideal          2757       693.       326
```
--

```r
bijou %>% 
  group_by(cut, color) %>% 
  summarize(max_price = max(price), 
            mean_price = mean(price), 
            min_price = min(price)) %>% head(n = 5)
```

```
## # A tibble: 5 x 5
## # Groups:   cut [2]
##   cut   color max_price mean_price min_price
##   <ord> <ord>     <dbl>      <dbl>     <dbl>
## 1 Fair  E          2757      1547        337
## 2 Fair  F          2759      2759       2759
## 3 Good  D           403       403        403
## 4 Good  E          2759      1010.       327
## 5 Good  F          2759      1580.       402
```

---
name: other_data_manipulations
## Other data manipulation tips

```r
bijou %>% group_by(cut) %>% summarize(count = n())
```

```
## # A tibble: 5 x 2
##   cut       count
##   <ord>     <int>
## 1 Fair          3
## 2 Good         18
## 3 Very Good    38
## 4 Premium      22
## 5 Ideal        19
```
--
When you need to regroup within the same pipe, use `ungroup()`.
---
name: concept_of_tidy_data

## The Concept of Tidy Data
Data are tidy *sensu Wickham* :-) if:
* each and every observation is represented as exactly one row,
* each and every variable is represented by exactly one column,
* thus each data table cell contains only one value.
<img src="assets/tidy_data.png" width="2560" style="display: block; margin: auto auto auto 0;" />

Usually data are untidy in only one way. However, if you are unlucky, they are really untidy and thus a pain to work with...
---
name: tidy_data

## Tidy Data
<img src="assets/tidy_data.png" width="2560" style="display: block; margin: auto auto auto 0;" />
--
.center[**Are these data tidy?**]

.pull-left-70[
<table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; ">
 <thead>
  <tr>
   <th style="text-align:center;"> Sepal.Length </th>
   <th style="text-align:center;"> Sepal.Width </th>
   <th style="text-align:center;"> Petal.Length </th>
   <th style="text-align:center;"> Petal.Width </th>
   <th style="text-align:center;"> Species </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 5.1 </td>
   <td style="text-align:center;"> 3.5 </td>
   <td style="text-align:center;"> 1.4 </td>
   <td style="text-align:center;"> 0.2 </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4.9 </td>
   <td style="text-align:center;"> 3.0 </td>
   <td style="text-align:center;"> 1.4 </td>
   <td style="text-align:center;"> 0.2 </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4.7 </td>
   <td style="text-align:center;"> 3.2 </td>
   <td style="text-align:center;"> 1.3 </td>
   <td style="text-align:center;"> 0.2 </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
</tbody>
</table>
]
--
.pull-right-30[
<table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; ">
 <thead>
  <tr>
   <th style="text-align:center;"> Species </th>
   <th style="text-align:center;"> variable </th>
   <th style="text-align:center;"> value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> setosa </td>
   <td style="text-align:center;"> Sepal.Length </td>
   <td style="text-align:center;"> 5.1 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> setosa </td>
   <td style="text-align:center;"> Sepal.Length </td>
   <td style="text-align:center;"> 4.9 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> setosa </td>
   <td style="text-align:center;"> Sepal.Length </td>
   <td style="text-align:center;"> 4.7 </td>
  </tr>
</tbody>
</table>
]
<br>&nbsp;<hr><br>

--
.pull-left-50[
<table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; ">
 <thead>
  <tr>
   <th style="text-align:center;"> Sepal.L.W </th>
   <th style="text-align:center;"> Petal.L.W </th>
   <th style="text-align:center;"> Species </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 5.1/3.5 </td>
   <td style="text-align:center;"> 1.4/0.2 </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4.9/3 </td>
   <td style="text-align:center;"> 1.4/0.2 </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4.7/3.2 </td>
   <td style="text-align:center;"> 1.3/0.2 </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
</tbody>
</table>
]
--
.pull-right-50[
<table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; ">
<tbody>
  <tr>
   <td style="text-align:left;"> Sepal.Length </td>
   <td style="text-align:center;"> 5.1 </td>
   <td style="text-align:center;"> 4.9 </td>
   <td style="text-align:center;"> 4.7 </td>
   <td style="text-align:center;"> 4.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sepal.Width </td>
   <td style="text-align:center;"> 3.5 </td>
   <td style="text-align:center;"> 3.0 </td>
   <td style="text-align:center;"> 3.2 </td>
   <td style="text-align:center;"> 3.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Petal.Length </td>
   <td style="text-align:center;"> 1.4 </td>
   <td style="text-align:center;"> 1.4 </td>
   <td style="text-align:center;"> 1.3 </td>
   <td style="text-align:center;"> 1.5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Petal.Width </td>
   <td style="text-align:center;"> 0.2 </td>
   <td style="text-align:center;"> 0.2 </td>
   <td style="text-align:center;"> 0.2 </td>
   <td style="text-align:center;"> 0.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Species </td>
   <td style="text-align:center;"> setosa </td>
   <td style="text-align:center;"> setosa </td>
   <td style="text-align:center;"> setosa </td>
   <td style="text-align:center;"> setosa </td>
  </tr>
</tbody>
</table>
]

---
name: tidying_data_gather
## Tidying Data with `gather`

If some of your column names are actually values of a variable, use `gather`:

```r
bijou2 %>% head(n = 5)
```

```
## # A tibble: 5 x 3
##   cut     `2008` `2009`
##   <ord>    <int>  <dbl>
## 1 Ideal      326   332.
## 2 Premium    326   332.
## 3 Good       327   333.
## 4 Premium    334   340.
## 5 Good       335   341.
```

```r
bijou2 %>% 
  gather(`2008`, `2009`, key = 'year', value = 'price') %>% 
  head(n = 5)
```

```
## # A tibble: 5 x 3
##   cut     year  price
##   <ord>   <chr> <dbl>
## 1 Ideal   2008    326
## 2 Premium 2008    326
## 3 Good    2008    327
## 4 Premium 2008    334
## 5 Good    2008    335
```

---
name: tidying_data_spread
## Tidying Data with `spread`

If some of your observations are scattered across many rows, use `gather`:

```r
bijou3
```

```
## # A tibble: 9 x 5
##   cut     price clarity dimension measurement
##   <ord>   <int> <ord>   <chr>           <dbl>
## 1 Ideal     326 SI2     x                3.95
## 2 Premium   326 SI1     x                3.89
## 3 Good      327 VS1     x                4.05
## 4 Ideal     326 SI2     y                3.98
## 5 Premium   326 SI1     y                3.84
## 6 Good      327 VS1     y                4.07
## 7 Ideal     326 SI2     z                2.43
## 8 Premium   326 SI1     z                2.31
## 9 Good      327 VS1     z                2.31
```

```r
bijou3 %>% 
  spread(key=dimension, value=measurement) %>% 
  head(n = 5)
```

```
## # A tibble: 3 x 6
##   cut     price clarity     x     y     z
##   <ord>   <int> <ord>   <dbl> <dbl> <dbl>
## 1 Good      327 VS1      4.05  4.07  2.31
## 2 Premium   326 SI1      3.89  3.84  2.31
## 3 Ideal     326 SI2      3.95  3.98  2.43
```

---
name: tidying_data_separate
## Tidying Data with `separate`

If some of your columns contain more than one value, use `separate`:

```r
bijou4
```

```
## # A tibble: 5 x 4
##   cut     price clarity dim           
##   <ord>   <int> <ord>   <chr>         
## 1 Ideal     326 SI2     3.95/3.98/2.43
## 2 Premium   326 SI1     3.89/3.84/2.31
## 3 Good      327 VS1     4.05/4.07/2.31
## 4 Premium   334 VS2     4.2/4.23/2.63 
## 5 Good      335 SI2     4.34/4.35/2.75
```

```r
bijou4 %>% 
  separate(dim, into = c("x", "y", "z"), sep = "/", convert = T)
```

```
## # A tibble: 5 x 6
##   cut     price clarity     x     y     z
##   <ord>   <int> <ord>   <dbl> <dbl> <dbl>
## 1 Ideal     326 SI2      3.95  3.98  2.43
## 2 Premium   326 SI1      3.89  3.84  2.31
## 3 Good      327 VS1      4.05  4.07  2.31
## 4 Premium   334 VS2      4.2   4.23  2.63
## 5 Good      335 SI2      4.34  4.35  2.75
```

---
name: tidying_data_separate
## Tidying Data with `unite`

If some of your columns contain more than one value, use `separate`:

```r
bijou5
```

```
## # A tibble: 5 x 7
##   cut     price clarity_prefix clarity_suffix     x     y     z
##   <ord>   <int> <chr>          <chr>          <dbl> <dbl> <dbl>
## 1 Ideal     326 SI             2               3.95  3.98  2.43
## 2 Premium   326 SI             1               3.89  3.84  2.31
## 3 Good      327 VS             1               4.05  4.07  2.31
## 4 Premium   334 VS             2               4.2   4.23  2.63
## 5 Good      335 SI             2               4.34  4.35  2.75
```

```r
bijou5 %>% unite(clarity, clarity_prefix, clarity_suffix, sep='')
```

```
## # A tibble: 5 x 6
##   cut     price clarity     x     y     z
##   <ord>   <int> <chr>   <dbl> <dbl> <dbl>
## 1 Ideal     326 SI2      3.95  3.98  2.43
## 2 Premium   326 SI1      3.89  3.84  2.31
## 3 Good      327 VS1      4.05  4.07  2.31
## 4 Premium   334 VS2      4.2   4.23  2.63
## 5 Good      335 SI2      4.34  4.35  2.75
```
**Note:** that `sep` is here interpreted as the position to split on. It can also be a *regular expression* or a delimiting string/character. Pretty flexible approach!

---
name: missing_complete
## Completing Missing Values Using `complete`

```r
bijou %>% head(n = 10) %>% 
  select(cut, clarity, price) %>% 
  mutate(continent = sample(c('AusOce', 'Eur'), 
                            size = 10, 
                            replace = T)) -> missing_stones
```

```r
missing_stones %>% complete(cut, continent)
```

```
## # A tibble: 12 x 4
##    cut       continent clarity price
##    <ord>     <chr>     <ord>   <int>
##  1 Fair      AusOce    <NA>       NA
##  2 Fair      Eur       VS2       337
##  3 Good      AusOce    SI2       335
##  4 Good      Eur       VS1       327
##  5 Very Good AusOce    VVS1      336
##  6 Very Good AusOce    SI1       337
##  7 Very Good Eur       VVS2      336
##  8 Very Good Eur       VS1       338
##  9 Premium   AusOce    SI1       326
## 10 Premium   Eur       VS2       334
## 11 Ideal     AusOce    <NA>       NA
## 12 Ideal     Eur       SI2       326
```

---
name: joins
## Combining Datasets
Often, we need to combine a number of data tables (relational data) to get the full picture of the data. Here different types of *joins* come to help:
* *mutating joins* that add new variables to data table `A` based on matching observations (rows) from data table `B`,
* *filtering joins* that filter observations from data table `A` based on whether they match observations in data table `B`,
* *set operations* that treat observations in `A` and `B` as elements of a set.

Let us create two example tibbles that share a key:
.pull-left-50[

```r
A <- tribble(
  ~key, ~x,
  'a', 'A1',
  'b', 'A2',
  'c', 'A3',
  'e','A4'
)
```
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> key </th>
   <th style="text-align:left;"> x </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> a </td>
   <td style="text-align:left;"> A1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> b </td>
   <td style="text-align:left;"> A2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> c </td>
   <td style="text-align:left;"> A3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> e </td>
   <td style="text-align:left;"> A4 </td>
  </tr>
</tbody>
</table>
]
.pull-right-50[

```r
B <- tribble(
  ~key, ~y,
  'a', 'B1',
  'b', NA,
  'c', 'B3',
  'd','B4'
)
```
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> key </th>
   <th style="text-align:left;"> y </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> a </td>
   <td style="text-align:left;"> B1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> b </td>
   <td style="text-align:left;"> NA </td>
  </tr>
  <tr>
   <td style="text-align:left;"> c </td>
   <td style="text-align:left;"> B3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> d </td>
   <td style="text-align:left;"> B4 </td>
  </tr>
</tbody>
</table>
]

---
name: inner_join
## The Joins Family

.pull-left-50[
The `inner join`:

```r
A %>% inner_join(B, by = 'key')
# All non-matching rows are dropped!
```

```
## # A tibble: 3 x 3
##   key   x     y    
##   <chr> <chr> <chr>
## 1 a     A1    B1   
## 2 b     A2    <NA> 
## 3 c     A3    B3
```
]
--
.pull-right-50[
The `left_join`:

```r
A %>% left_join(B, by = 'key')
```

```
## # A tibble: 4 x 3
##   key   x     y    
##   <chr> <chr> <chr>
## 1 a     A1    B1   
## 2 b     A2    <NA> 
## 3 c     A3    B3   
## 4 e     A4    <NA>
```
]
--
<br>
.pull-left-50[
The `right_join`:

```r
A %>% right_join(B, by = 'key')
```

```
## # A tibble: 4 x 3
##   key   x     y    
##   <chr> <chr> <chr>
## 1 a     A1    B1   
## 2 b     A2    <NA> 
## 3 c     A3    B3   
## 4 d     <NA>  B4
```
]
--
.pull-right-50[
The `full_join`:

```r
A %>% full_join(B, by = 'key')
```

```
## # A tibble: 5 x 3
##   key   x     y    
##   <chr> <chr> <chr>
## 1 a     A1    B1   
## 2 b     A2    <NA> 
## 3 c     A3    B3   
## 4 e     A4    <NA> 
## 5 d     <NA>  B4
```
]

---
name: purrr_map
## Using `map` Functions from `purrr`

Base R `apply` functions have their `tidyverse` counterparts.

```r
cars <- as.tibble(mtcars)
cars %>% head(n = 5)
```

```
## # A tibble: 5 x 11
##     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
## 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
## 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
## 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
## 5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2
```
The `map` function:

```r
cars %>% select(disp, hp) %>% map(mean)
```

```
## $disp
## [1] 230.7219
## 
## $hp
## [1] 146.6875
```

---
name: purrr_map_family
## Different Members of the `map` Family
* `map()` -- returns a list,
* `map_lgl()` -- returns a logical vector,
* `map_int()` -- returns a vector of integers,
* `map_dbl()` -- returns a vector of doubles,
* `map_chr()` -- returns a vector of characters.

---
name: purrr_shortcut_anonymous
## Anonymous Functions in `purrr`
.pull-left-50[
base-R

```r
models <- cars %>% 
  split(.$cyl) %>%
  map(function(dat) 
    lm(mpg ~ wt, data = dat))
```
Now, make summary for each model:

```r
models %>% 
  map(summary) %>%
  map_dbl(~.$r.squared)
```

```
##         4         6         8 
## 0.5086326 0.4645102 0.4229655
```
]
.pull-right-50[
`purrr`

```r
models <- cars %>% 
  split(.$cyl) %>%
  map(~lm(mpg ~ wt, data = .))
```
Now, make summary for each model using even simpler syntax:

```r
models %>% 
  map(summary) %>%
  map_dbl("r.squared")
```

```
##         4         6         8 
## 0.5086326 0.4645102 0.4229655
```
]

---
name: purrr_safely
## Possibly Quiet and Safe
How to deal with errors in `purrr`:

* `safely()` -- result is a list with 2 elements:

.pull-left-50[
  + `result` contains NULL if error occured, the result otherwise,
  
  ```r
  safe_sqrt <- safely(sqrt)
  safe_sqrt(4) %>% str()
  ```
  
  ```
  ## List of 2
  ##  $ result: num 2
  ##  $ error : NULL
  ```
]
.pull-right-50[
  + `error` contains NULL if no error occured, error object otherwise
  
  ```r
  safe_sqrt <- safely(sqrt)
  safe_sqrt('zebra') %>% str()
  ```
  
  ```
  ## List of 2
  ##  $ result: NULL
  ##  $ error :List of 2
  ##   ..$ message: chr "non-numeric argument to mathematical function"
  ##   ..$ call   : language sqrt(x = x)
  ##   ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
  ```
]

---
name: purrr_possibly
## Possibly Quiet and Safe cted.

* `possibly()` let's you define what value to return upon error:

```r
tst <- list(4, 7, 'test')
tst %>% map_dbl(possibly(sqrt, NA_real_))
```

```
## [1] 2.000000 2.645751       NA
```
--
* `quietly()` -- it captures output, messages and warnings and returns it as a list:

```r
x <- list(-1, 1)
x %>% map(quietly(log)) %>% str()
```

```
## List of 2
##  $ :List of 4
##   ..$ result  : num NaN
##   ..$ output  : chr ""
##   ..$ warnings: chr "NaNs produced"
##   ..$ messages: chr(0) 
##  $ :List of 4
##   ..$ result  : num 0
##   ..$ output  : chr ""
##   ..$ warnings: chr(0) 
##   ..$ messages: chr(0)
```

---
name: more_on_map
## More on `map` Functions
What if one wants to map over more than one argument?

```r
means <- c(22, 32, 42)
std_devs <- c(2.5, 5, 10)
my_rnorms <- map2(means, std_devs, rnorm, n = 100)
```
--
<img src="tidyverse_presentation_files/figure-html/unnamed-chunk-55-1.png" style="display: block; margin: auto;" />
--

```r
 my_rnorms %>% 
  setNames(LETTERS[1:length(means)]) %>% 
  as_tibble() %>% 
  gather(LETTERS[1]:LETTERS[length(means)], key='run', value='num') %>% 
  ggplot(mapping = aes(x = num)) + 
  geom_density() + 
  facet_grid(~ run) + 
  theme_bw()
```

---
name: more_on_map
## Mapping with Even More Arguments

```r
param_sets <- tribble(
  ~mean, ~sd, ~n,
  22, 2.5, 50,
  32, 5, 100,
  42, 10, 250
)
```
--
<table>
 <thead>
  <tr>
   <th style="text-align:right;"> mean </th>
   <th style="text-align:right;"> sd </th>
   <th style="text-align:right;"> n </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 22 </td>
   <td style="text-align:right;"> 2.5 </td>
   <td style="text-align:right;"> 50 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 32 </td>
   <td style="text-align:right;"> 5.0 </td>
   <td style="text-align:right;"> 100 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 42 </td>
   <td style="text-align:right;"> 10.0 </td>
   <td style="text-align:right;"> 250 </td>
  </tr>
</tbody>
</table>

```r
param_sets %>% pmap(rnorm) %>% str()
```

```
## List of 3
##  $ : num [1:50] 23.9 20.4 19.7 21.1 22 ...
##  $ : num [1:100] 25.3 34.3 34.6 34.2 33.9 ...
##  $ : num [1:250] 44.1 37.7 39.3 32.1 36.4 ...
```
---
name: invoke_map 
## Invoking Different Functions

```r
param_sets <- tribble(
  ~f,       ~params,
  "runif", list(min = -1, max = 1),
  "rnorm", list(mean = 32, sd = 2),
  "rpois", list(lambda = 10)
)
result <- param_sets %>% 
  mutate(call_result = invoke_map(f, params, n = 100))
```
--
<img src="tidyverse_presentation_files/figure-html/unnamed-chunk-60-1.png" style="display: block; margin: auto;" />

---
name: walk
## Let's Take a `walk` to the Printing House
What if you want to map a function for its side-effects?

```r
list(runif(10), rnorm(10)) %>% 
  walk(print) %>% 
  map(`*`,5)
```

```
##  [1] 0.19141448 0.73700574 0.87506734 0.29993970 0.38753262 0.85501264
##  [7] 0.48393766 0.47824787 0.95261771 0.07841113
##  [1] -1.5169800 -1.6235191  0.2819054  0.3771507 -0.3640784  0.1787987
##  [7] -0.9053670  1.2919439  0.2800154  1.8947971
## [[1]]
##  [1] 0.9570724 3.6850287 4.3753367 1.4996985 1.9376631 4.2750632 2.4196883
##  [8] 2.3912394 4.7630885 0.3920557
## 
## [[2]]
##  [1] -7.5848998 -8.1175957  1.4095268  1.8857537 -1.8203920  0.8939933
##  [7] -4.5268352  6.4597195  1.4000770  9.4739857
```
* `walk2()`
* `pwalk()`

---
name: some_every
## Predicate Functions
* keep all elements fulfilling a condition:

```r
iris %>% keep(is.factor) %>% str()
```

```
## 'data.frame':	150 obs. of  1 variable:
##  $ Species: Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
```
* discard all elements fulfilling a condition:

```r
iris$Petal.Length %>% discard(~ . >= 2) %>% str()
```

```
##  num [1:50] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
```
* `some()` and `every()`
* `detect()` and `detect_index()`

```r
10:1 %>% detect(~ . > 5)
10:1 %>% detect_index(~ . > 5)
```

```
## [1] 10
## [1] 1
```
* `head_while()`, `tail_while()`

---
name: more_tidyverse
## Some Other Friends
* `stringr` for string manipulation and regular expressions,
* `forcats` for working with factors,
* `lubridate` for working with dates.
---
name: end-slide
class: end-slide

# Thank you

---
name: report

## Session

* This presentation was created in RStudio using [`remarkjs`](https://github.com/gnab/remark) framework through R package [`xaringan`](https://github.com/yihui/xaringan).
* For R Markdown, see <http://rmarkdown.rstudio.com>
* For R Markdown presentations, see <https://rmarkdown.rstudio.com/lesson-11.html>

```r
R.version
```

```
##                _                           
## platform       x86_64-apple-darwin17.3.0   
## arch           x86_64                      
## os             darwin17.3.0                
## system         x86_64, darwin17.3.0        
## status                                     
## major          3                           
## minor          4.3                         
## year           2017                        
## month          11                          
## day            30                          
## svn rev        73796                       
## language       R                           
## version.string R version 3.4.3 (2017-11-30)
## nickname       Kite-Eating Tree
```