class: center, middle, inverse, title-slide # Tidy Work in Tidyverse ## RaukR 2019 • Advanced R for Bioinformatics ###
Marcin Kierczak
### NBIS, SciLifeLab --- exclude: true count: false <link href="https://fonts.googleapis.com/css?family=Roboto|Source+Sans+Pro:300,400,600|Ubuntu+Mono&subset=latin-ext" rel="stylesheet"> <!-- ----------------- Only edit title & author above this ----------------- --> ## Tidyverse -- What is it all About? * [Tidyverse](http://www.tidyverse.org) is a collection of packages. * Created by [Hadley Wickham](http://hadley.nz). * Gains popularity, on the way to become a *de facto* standard in data analyses. * Knowing how to use it can increase your salary :-) * A philosophy of programming or a programing paradigm. * Everything is about the flow of *tidy data*. .center[ <img src="assets_tidyverse/hex-tidyverse.png", style="height:200px;"> <img src="assets_tidyverse/Hadley-wickham2016-02-04.jpeg", style="height:200px;"> <img src="assets_tidyverse/RforDataScience.jpeg", style="height:200px;"> ] .vsmall[sources of images: www.tidyverse.org, Wikipedia, www.tidyverse.org] --- name: tidyverse_workflow ## Typical Tidyverse Workflow The tidyverse curse?<br><br> -- *Navigating the balance between base R and the tidyverse is a challenge to learn.* .right[.small[-- [Robert A. Muenchen](http://r4stats.com/articles/why-r-is-hard-to-learn/)]] <br><br> -- <img src="assets_tidyverse/tidyverse-flow.png", style="height:400px;"><br> .vsmall[source: http://www.storybench.org/getting-started-with-tidyverse-in-r/] --- name: intro_to_pipes ## Introduction to Pipes .pull-left-50[ .center[ <img src="assets_tidyverse/MagrittePipe.jpg" width="300" style="display: block; margin: auto auto auto 0;" /> ] .vsmall[ Rene Magritt, *La trahison des images*, [Wikimedia Commons](https://en.wikipedia.org/wiki/The_Treachery_of_Images#/media/File:MagrittePipe.jpg) ] <br> .center[ <img src="assets_tidyverse/magrittr.png" width="150" style="display: block; margin: auto auto auto 0;" /> ] ] -- .pull-right-50[ * Let the data flow. * *Ceci n'est pas une pipe* -- `magrittr` * The `%>%` pipe: + `x %>% f` `\(\equiv\)` `f(x)` + `x %>% f(y)` `\(\equiv\)` `f(x, y)` + `x %>% f %>% g %>% h` `\(\equiv\)` `h(g(f(x)))` ] -- .pull-right-50[ instead of writing this: ```r data <- iris data <- head(data, n=3) ``` ] -- .pull-right-50[ write this: ```r iris %>% head(n=3) ``` ``` ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ``` ] --- name: other_pipes_T ## Other Types of `magrittr` Pipes -- `%T>%` .pull-left-50[ The %T>% pipe is useful when you call a function for its side effects: ```r rnorm(50) %>% matrix(ncol = 2) %>% plot() %>% summary() ``` ``` ## Length Class Mode ## 0 NULL NULL ``` <img src="tidyverse_presentation_files/figure-html/magrittr2a-1.png" style="display: block; margin: auto auto auto 0;" /> ] -- .pull-right-50[ ```r rnorm(50) %>% matrix(ncol = 2) %T>% plot() %>% summary() ``` ``` ## V1 V2 ## Min. :-2.04917 Min. :-2.32503 ## 1st Qu.:-0.46937 1st Qu.:-0.58177 ## Median :-0.02379 Median :-0.01368 ## Mean :-0.04233 Mean :-0.09766 ## 3rd Qu.: 0.42620 3rd Qu.: 0.42888 ## Max. : 2.53309 Max. : 2.05622 ``` <img src="tidyverse_presentation_files/figure-html/magrittr2b-1.png" style="display: block; margin: auto auto auto 0;" /> ] --- name: the_splitting_pipe ## Other Types of `magrittr` Pipes -- `%$%` ```r iris %>% cor(Sepal.Length, Sepal.Width) ``` ``` ## Error in pmatch(use, c("all.obs", "complete.obs", "pairwise.complete.obs", : object 'Sepal.Width' not found ``` We need the `%$%` pipe with exposition of variables: ```r iris %$% cor(Sepal.Length, Sepal.Width) ``` ``` ## [1] -0.1175698 ``` This is because the `cor` function does not have the `data` argument (which also should be the first argument of a pipe-friendly function). ### The %<>% Pipe It exists but can lead to somewhat confusing code. `x %<>% f` `\(\equiv\)` `x <- f(x)` ```r M <- matrix(rnorm(16), nrow=4) M %<>% colSums() M ``` ``` ## [1] -4.228428 1.474212 -2.081135 -1.084846 ``` --- name: magrittr_placeholder ## Placeholders in `magrittr` Pipes Sometimes we want to pass the resulting data to *other than the first* argument of the next function in chain. `magritter` provides placeholder mechanism for this: * `x %>% f(y, .)` `\(\equiv\)` `f(y, x)`, * `x %>% f(y, z = .)` `\(\equiv\)` `f(y, z = x)`. But for nested expressions: * `x %>% f(a = p(.), b = q(.))` `\(\equiv\)` `f(x, a = p(x), b = q(x))`, * `x %>% {f(a = p(.), b = q(.))}` `\(\equiv\)` `f(a = p(x), b = q(x))`. Examples: ```r M <- rnorm(4) %>% matrix(nrow = 2) M %>% `%*%`(., .) ``` ``` ## [,1] [,2] ## [1,] 0.6760221 -0.1882214 ## [2,] 1.8141353 -0.3358698 ``` ```r print_M_summ <- function(nrow, ncol) { paste0('Matrix M has: ', nrow, ' rows and ', ncol, ' columns.') } M %>% {print_M_summ(nrow(.), ncol(.))} ``` ``` ## [1] "Matrix M has: 2 rows and 2 columns." ``` --- name: tibble_intro ## Tibbles .pull-left-50[ <img src="assets_tidyverse/hex-tibble.png" width="160" style="display: block; margin: auto;" /> ```r as_tibble(iris) ``` ``` ## # A tibble: 150 x 5 ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## <dbl> <dbl> <dbl> <dbl> <fct> ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa ## # … with 140 more rows ``` ] .pull-right-50[ * `tibble` is one of the unifying features of tidyverse, * it is a *better* `data.frame` realization, * objects `data.frame` can be coerced to `tibble` using `as_tibble()` ```r tibble( x = 1, # recycling y = runif(50), z = x + y^2, outcome = rnorm(50) ) ``` ``` ## # A tibble: 50 x 4 ## x y z outcome ## <dbl> <dbl> <dbl> <dbl> ## 1 1 0.749 1.56 -1.15 ## 2 1 0.426 1.18 0.742 ## 3 1 0.508 1.26 -0.123 ## 4 1 0.813 1.66 2.08 ## 5 1 0.825 1.68 -1.25 ## 6 1 0.0240 1.00 -0.289 ## 7 1 0.0968 1.01 0.621 ## 8 1 0.0621 1.00 1.96 ## 9 1 0.499 1.25 0.983 ## 10 1 0.544 1.30 -0.540 ## # … with 40 more rows ``` ] --- name: tibble2 ## More on Tibbles * When you print a `tibble`: + all columns that fit the screen are shown, + first 10 rows are shown, + data type for each column is shown. ```r as_tibble(cars) ``` ``` ## # A tibble: 50 x 2 ## speed dist ## <dbl> <dbl> ## 1 4 2 ## 2 4 10 ## 3 7 4 ## 4 7 22 ## 5 8 16 ## 6 9 10 ## 7 10 18 ## 8 10 26 ## 9 10 34 ## 10 11 17 ## # … with 40 more rows ``` * `my_tibble %>% print(n = 50, width = Inf)`, * `options(tibble.print_min = 15, tibble.print_max = 25)`, * `options(dplyr.print_min = Inf)`, * `options(tibble.width = Inf)` --- name: tibble2 ## Subsetting Tibbles ```r vehicles <- as_tibble(cars[1:5,]) vehicles[['speed']] vehicles[[1]] vehicles$speed # Using placeholders vehicles %>% .$dist vehicles %>% .[['dist']] vehicles %>% .[[2]] ``` ``` ## [1] 4 4 7 7 8 ## [1] 4 4 7 7 8 ## [1] 4 4 7 7 8 ## [1] 2 10 4 22 16 ## [1] 2 10 4 22 16 ## [1] 2 10 4 22 16 ``` -- **Note!** Not all old R functions work with tibbles, than you have to use `as.data.frame(my_tibble)`. --- name: tibbles_partial_matching ## Tibbles are Stricter than `data.frames` ```r cars$spe # partial matching ``` ``` ## [1] 4 4 7 7 8 ``` ```r vehicles$spe # no partial matching ``` ``` ## Warning: Unknown or uninitialised column: 'spe'. ``` ``` ## NULL ``` ```r cars$gear ``` ``` ## NULL ``` ```r vehicles$gear ``` ``` ## Warning: Unknown or uninitialised column: 'gear'. ``` ``` ## NULL ``` --- name: loading_data ## Loading Data In `tidyverse` you import data using `readr` package that provides a number of useful data import functions: * `read_delim()` a generic function for reading *-delimited files. There are a number of convenience wrappers: + `read_csv()` used to read comma-delimited files, + `read_csv2()` reads semicolon-delimited files, `read_tsv()` that reads tab-delimited files. * `read_fwf` for reading fixed-width files with its wrappers: + fwf_widths() for width-based reading, + fwf_positions() for positions-based reading and + read_table() for reading white space-delimited fixed-width files. * `read_log()` for reading Apache-style logs. The most commonly used `read_csv()` has some familiar arguments like: * `skip` -- to specify the number of rows to skip (headers), * `col_names` -- to supply a vector of column names, * `comment` -- to specify what character designates a comment, * `na` -- to specify how missing values are represented. --- name: parse_functions ## Under the Hood -- `parse_*` Functions Under the hood, data-reading functions use `parse_*` functions: ```r parse_double("42.24") ``` ``` ## [1] 42.24 ``` ```r parse_number("272'555'849,55", locale = locale(decimal_mark = ",", grouping_mark = "'" ) ) ``` ``` ## [1] 272555850 ``` ```r parse_number(c('100%', 'price: 500$', '21sek', '42F')) ``` ``` ## [1] 100 500 21 42 ``` --- name: parsing_strings ## Parsing Strings * Strings can be represented in different encodings: ```r text1 <- 'På en ö är en å' text2 <- 'Zażółć gęślą jaźń' ``` ```r text1 charToRaw(text2) parse_character(text1, locale = locale(encoding = 'UTF-8')) guess_encoding(charToRaw("Test")) guess_encoding(charToRaw(text1)) ``` ``` ## [1] "På en ö är en å" ## [1] 5a 61 c5 bc c3 b3 c5 82 c4 87 20 67 c4 99 c5 9b 6c c4 85 20 6a 61 c5 ## [24] ba c5 84 ## [1] "På en ö är en å" ## # A tibble: 1 x 2 ## encoding confidence ## <chr> <dbl> ## 1 ASCII 1 ## # A tibble: 4 x 2 ## encoding confidence ## <chr> <dbl> ## 1 UTF-8 1 ## 2 ISO-8859-1 0.83 ## 3 ISO-8859-9 0.33 ## 4 ISO-8859-2 0.3 ``` --- name: parsing_factors ## Parsing Factors * R is using factors to represent cathegorical variables. * Supply known levels to `parse_factor` so that it warns you when an unknown level is present in the data: ```r landscapes <- c('mountains', 'swamps', 'seaside') parse_factor(c('mountains', 'plains', 'seaside', 'swamps'), levels = landscapes) ``` ``` ## Warning: 1 parsing failure. ## row col expected actual ## 2 -- value in level set plains ``` ``` ## [1] mountains <NA> seaside swamps ## attr(,"problems") ## # A tibble: 1 x 4 ## row col expected actual ## <int> <int> <chr> <chr> ## 1 2 NA value in level set plains ## Levels: mountains swamps seaside ``` --- name: parsing_other_functions ## Other Parsing Functions `parse_` * `vector`, `time`, `number`, `logical`, `integer`, `double`, `character`, `date`, `datetime`, * `guess` ```r guess_parser("2018-06-11 09:00:00") parse_guess("2018-06-11 09:00:00") guess_parser(c(1, 2.3, "23$", "54%")) parse_guess(c(1, 2.3, "23$", "54%")) ``` ``` ## [1] "datetime" ## [1] "2018-06-11 09:00:00 UTC" ## [1] "character" ## [1] "1" "2.3" "23$" "54%" ``` --- name: readr ## Importing Data Using `readr` When reading and parsing a file, `readr` attempts to guess proper parser for each column by looking at the 1000 first rows. ```r tricky_dataset <- read_csv(readr_example('challenge.csv')) ``` ``` ## Parsed with column specification: ## cols( ## x = col_double(), ## y = col_logical() ## ) ``` ``` ## Warning: 1000 parsing failures. ## row col expected actual file ## 1001 y 1/0/T/F/TRUE/FALSE 2015-01-16 '/Library/Frameworks/R.framework/Versions/3.5/Resources/library/readr/extdata/challenge.csv' ## 1002 y 1/0/T/F/TRUE/FALSE 2018-05-18 '/Library/Frameworks/R.framework/Versions/3.5/Resources/library/readr/extdata/challenge.csv' ## 1003 y 1/0/T/F/TRUE/FALSE 2015-09-05 '/Library/Frameworks/R.framework/Versions/3.5/Resources/library/readr/extdata/challenge.csv' ## 1004 y 1/0/T/F/TRUE/FALSE 2012-11-28 '/Library/Frameworks/R.framework/Versions/3.5/Resources/library/readr/extdata/challenge.csv' ## 1005 y 1/0/T/F/TRUE/FALSE 2020-01-13 '/Library/Frameworks/R.framework/Versions/3.5/Resources/library/readr/extdata/challenge.csv' ## .... ... .................. .......... ............................................................................................ ## See problems(...) for more details. ``` OK, so there are some parsing failures. We can examine them more closely using `problems()` as suggested in the above output. --- name: readr_problems ## Looking at Problematic Columns ```r p <- problems(tricky_dataset) p ``` ``` ## # A tibble: 1,000 x 5 ## row col expected actual file ## <int> <chr> <chr> <chr> <chr> ## 1 1001 y 1/0/T/F/TRUE… 2015-01… '/Library/Frameworks/R.framework/Ver… ## 2 1002 y 1/0/T/F/TRUE… 2018-05… '/Library/Frameworks/R.framework/Ver… ## 3 1003 y 1/0/T/F/TRUE… 2015-09… '/Library/Frameworks/R.framework/Ver… ## 4 1004 y 1/0/T/F/TRUE… 2012-11… '/Library/Frameworks/R.framework/Ver… ## 5 1005 y 1/0/T/F/TRUE… 2020-01… '/Library/Frameworks/R.framework/Ver… ## 6 1006 y 1/0/T/F/TRUE… 2016-04… '/Library/Frameworks/R.framework/Ver… ## 7 1007 y 1/0/T/F/TRUE… 2011-05… '/Library/Frameworks/R.framework/Ver… ## 8 1008 y 1/0/T/F/TRUE… 2020-07… '/Library/Frameworks/R.framework/Ver… ## 9 1009 y 1/0/T/F/TRUE… 2011-04… '/Library/Frameworks/R.framework/Ver… ## 10 1010 y 1/0/T/F/TRUE… 2010-05… '/Library/Frameworks/R.framework/Ver… ## # … with 990 more rows ``` OK, let's see which columns cause trouble: ```r p %$% table(col) ``` ``` ## col ## y ## 1000 ``` Looks like the problem occurs only in the `x` column. --- name: readr_problems_fixing ## Fixing Problematic Columns So, how can we fix the problematic columns? 1. We can explicitely tell what parser to use: ```r tricky_dataset <- read_csv(readr_example('challenge.csv'), col_types = cols(x = col_double(), y = col_character() ) ) tricky_dataset %>% tail(n = 5) ``` ``` ## # A tibble: 5 x 2 ## x y ## <dbl> <chr> ## 1 0.164 2018-03-29 ## 2 0.472 2014-08-04 ## 3 0.718 2015-08-16 ## 4 0.270 2020-02-04 ## 5 0.608 2019-01-06 ``` As you can see, we can still do better by parsing the `y` column as *date*, not as *character*. --- name: readr_problems_fixing2 ## Fixing Problematic Columns cted. But knowing that the parser is guessed based on the first 1000 lines, we can see what sits past the 1000-th line in the data: ```r tricky_dataset %>% head(n = 1002) %>% tail(n = 4) ``` ``` ## # A tibble: 4 x 2 ## x y ## <dbl> <chr> ## 1 4569 <NA> ## 2 4548 <NA> ## 3 0.238 2015-01-16 ## 4 0.412 2018-05-18 ``` It seems, we were very unlucky, because up till 1000-th line there are only integers in the x column and `NA`s in the y column so the parser cannot be guessed correctly. To fix this: ```r tricky_dataset <- read_csv(readr_example('challenge.csv'), guess_max = 1001) ``` ``` ## Parsed with column specification: ## cols( ## x = col_double(), ## y = col_date(format = "") ## ) ``` --- name: readr_writing ## Writing to a File The `readr` package also provides functions useful for writing tibbled data into a file: * `write_csv()` * `write_tsv()` * `write_excel_csv()` They **always** save: * text in UTF-8, * dates in ISO8601 But saving in csv (or tsv) does mean you loose information about the type of data in particular columns. You can avoid this by using: * `write_rds()` and `read_rds()` to read/write objects in R binary rds format, * use `write_feather()` and `read_feather()` from package `feather` to read/write objects in a fast binary format that other programming languages can access. --- name: basic_data_transformations ## Basic Data Transformations with `dplyr` Let us create a tibble: ```r bijou <- as_tibble(diamonds) %>% head(n = 100) bijou ``` ``` ## # A tibble: 100 x 10 ## carat cut color clarity depth table price x y z ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 ## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 ## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 ## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63 ## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 ## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 ## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 ## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 ## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 ## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 ## # … with 90 more rows ``` .center[ <img src="assets_tidyverse/diamonds.png", style="height:200px"> ] --- name: filter ## Picking Observations using `filter()` ```r bijou %>% filter(cut == 'Ideal' | cut == 'Premium', carat >= 0.23) %>% head(n = 5) ``` ``` ## # A tibble: 5 x 10 ## carat cut color clarity depth table price x y z ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 ## 2 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63 ## 3 0.23 Ideal J VS1 62.8 56 340 3.93 3.9 2.46 ## 4 0.31 Ideal J SI2 62.2 54 344 4.35 4.37 2.71 ## 5 0.32 Premium E I1 60.9 58 345 4.38 4.42 2.68 ``` Be careful with floating point comparisons! Also, rows with comparison resulting in `NA` are skipped by default! ```r bijou %>% filter(near(0.23, carat) | is.na(carat)) %>% head(n = 5) ``` ``` ## # A tibble: 5 x 10 ## carat cut color clarity depth table price x y z ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 ## 2 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 ## 3 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39 ## 4 0.23 Ideal J VS1 62.8 56 340 3.93 3.9 2.46 ## 5 0.23 Very Good E VS2 63.8 55 352 3.85 3.92 2.48 ``` --- name: arrange ## Rearranging Observations using `arrange()` ```r bijou %>% arrange(cut, carat, desc(price)) ``` ``` ## # A tibble: 100 x 10 ## carat cut color clarity depth table price x y z ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 ## 2 0.86 Fair E SI2 55.1 69 2757 6.45 6.33 3.52 ## 3 0.96 Fair F SI2 66.3 62 2759 6.27 5.95 4.07 ## 4 0.23 Good F VS1 58.2 59 402 4.06 4.08 2.37 ## 5 0.23 Good E VS1 64.1 59 402 3.83 3.85 2.46 ## 6 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 ## 7 0.26 Good E VVS1 57.9 60 554 4.22 4.25 2.45 ## 8 0.26 Good D VS2 65.2 56 403 3.99 4.02 2.61 ## 9 0.26 Good D VS1 58.4 63 403 4.19 4.24 2.46 ## 10 0.3 Good H SI1 63.7 57 554 4.28 4.26 2.72 ## # … with 90 more rows ``` The `NA`s always end up at the end of the rearranged tibble. --- name: select ## Selecting Variables with `select()` Simple `select` with a range: ```r bijou %>% select(color, clarity, x:z) %>% head(n = 5) ``` ``` ## # A tibble: 5 x 5 ## color clarity x y z ## <ord> <ord> <dbl> <dbl> <dbl> ## 1 E SI2 3.95 3.98 2.43 ## 2 E SI1 3.89 3.84 2.31 ## 3 E VS1 4.05 4.07 2.31 ## 4 I VS2 4.2 4.23 2.63 ## 5 J SI2 4.34 4.35 2.75 ``` -- Exclusive `select`: ```r bijou %>% select(-(x:z)) %>% head(n = 5) ``` ``` ## # A tibble: 5 x 7 ## carat cut color clarity depth table price ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> ## 1 0.23 Ideal E SI2 61.5 55 326 ## 2 0.21 Premium E SI1 59.8 61 326 ## 3 0.23 Good E VS1 56.9 65 327 ## 4 0.290 Premium I VS2 62.4 58 334 ## 5 0.31 Good J SI2 63.3 58 335 ``` --- name: select2 ## Selecting Variables with `select()` cted. `rename` is a variant of `select`, here used with `everything()` to move `x` to the beginning and rename it to `var_x` ```r bijou %>% rename(var_x = x) %>% head(n = 5) ``` ``` ## # A tibble: 5 x 10 ## carat cut color clarity depth table price var_x y z ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 ## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 ## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 ## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63 ## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 ``` -- use `everything()` to bring some columns to the front: ```r bijou %>% select(x:z, everything()) %>% head(n = 5) ``` ``` ## # A tibble: 5 x 10 ## x y z carat cut color clarity depth table price ## <dbl> <dbl> <dbl> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> ## 1 3.95 3.98 2.43 0.23 Ideal E SI2 61.5 55 326 ## 2 3.89 3.84 2.31 0.21 Premium E SI1 59.8 61 326 ## 3 4.05 4.07 2.31 0.23 Good E VS1 56.9 65 327 ## 4 4.2 4.23 2.63 0.290 Premium I VS2 62.4 58 334 ## 5 4.34 4.35 2.75 0.31 Good J SI2 63.3 58 335 ``` --- name: mutate ## Create/alter new Variables with `mutate` ```r bijou %>% mutate(p = x + z, q = p + y) %>% select(-(depth:price)) %>% head(n = 5) ``` ``` ## # A tibble: 5 x 9 ## carat cut color clarity x y z p q ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.23 Ideal E SI2 3.95 3.98 2.43 6.38 10.4 ## 2 0.21 Premium E SI1 3.89 3.84 2.31 6.2 10.0 ## 3 0.23 Good E VS1 4.05 4.07 2.31 6.36 10.4 ## 4 0.290 Premium I VS2 4.2 4.23 2.63 6.83 11.1 ## 5 0.31 Good J SI2 4.34 4.35 2.75 7.09 11.4 ``` -- or with `transmute` (only the transformed variables will be retained) ```r bijou %>% transmute(carat, cut, sum = x + y + z) %>% head(n = 5) ``` ``` ## # A tibble: 5 x 3 ## carat cut sum ## <dbl> <ord> <dbl> ## 1 0.23 Ideal 10.4 ## 2 0.21 Premium 10.0 ## 3 0.23 Good 10.4 ## 4 0.290 Premium 11.1 ## 5 0.31 Good 11.4 ``` --- name: grouped_summaries ## Group and Summarize ```r bijou %>% group_by(cut) %>% summarize(max_price = max(price), mean_price = mean(price), min_price = min(price)) ``` ``` ## # A tibble: 5 x 4 ## cut max_price mean_price min_price ## <ord> <int> <dbl> <int> ## 1 Fair 2759 1951 337 ## 2 Good 2759 661. 327 ## 3 Very Good 2760 610. 336 ## 4 Premium 2760 569. 326 ## 5 Ideal 2757 693. 326 ``` -- ```r bijou %>% group_by(cut, color) %>% summarize(max_price = max(price), mean_price = mean(price), min_price = min(price)) %>% head(n = 5) ``` ``` ## # A tibble: 5 x 5 ## # Groups: cut [2] ## cut color max_price mean_price min_price ## <ord> <ord> <int> <dbl> <int> ## 1 Fair E 2757 1547 337 ## 2 Fair F 2759 2759 2759 ## 3 Good D 403 403 403 ## 4 Good E 2759 1010. 327 ## 5 Good F 2759 1580. 402 ``` --- name: other_data_manipulations ## Other data manipulation tips ```r bijou %>% group_by(cut) %>% summarize(count = n()) ``` ``` ## # A tibble: 5 x 2 ## cut count ## <ord> <int> ## 1 Fair 3 ## 2 Good 18 ## 3 Very Good 38 ## 4 Premium 22 ## 5 Ideal 19 ``` -- When you need to regroup within the same pipe, use `ungroup()`. --- name: concept_of_tidy_data ## The Concept of Tidy Data Data are tidy *sensu Wickham* if: * each and every observation is represented as exactly one row, * each and every variable is represented by exactly one column, * thus each data table cell contains only one value. <img src="assets_tidyverse/tidy_data.png" width="2560" style="display: block; margin: auto auto auto 0;" /> Usually data are untidy in only one way. However, if you are unlucky, they are really untidy and thus a pain to work with... --- name: tidy_data ## Tidy Data <img src="assets_tidyverse/tidy_data.png" width="2560" style="display: block; margin: auto auto auto 0;" /> -- .center[**Are these data tidy?**] .pull-left-70[ <table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; "> <thead> <tr> <th style="text-align:center;"> Sepal.Length </th> <th style="text-align:center;"> Sepal.Width </th> <th style="text-align:center;"> Petal.Length </th> <th style="text-align:center;"> Petal.Width </th> <th style="text-align:center;"> Species </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 5.1 </td> <td style="text-align:center;"> 3.5 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.9 </td> <td style="text-align:center;"> 3.0 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.7 </td> <td style="text-align:center;"> 3.2 </td> <td style="text-align:center;"> 1.3 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> setosa </td> </tr> </tbody> </table> ] -- .pull-right-30[ <table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; "> <thead> <tr> <th style="text-align:center;"> Species </th> <th style="text-align:center;"> variable </th> <th style="text-align:center;"> value </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> setosa </td> <td style="text-align:center;"> Sepal.Length </td> <td style="text-align:center;"> 5.1 </td> </tr> <tr> <td style="text-align:center;"> setosa </td> <td style="text-align:center;"> Sepal.Length </td> <td style="text-align:center;"> 4.9 </td> </tr> <tr> <td style="text-align:center;"> setosa </td> <td style="text-align:center;"> Sepal.Length </td> <td style="text-align:center;"> 4.7 </td> </tr> </tbody> </table> ] <br> <hr><br> -- .pull-left-50[ <table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; "> <thead> <tr> <th style="text-align:center;"> Sepal.L.W </th> <th style="text-align:center;"> Petal.L.W </th> <th style="text-align:center;"> Species </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 5.1/3.5 </td> <td style="text-align:center;"> 1.4/0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.9/3 </td> <td style="text-align:center;"> 1.4/0.2 </td> <td style="text-align:center;"> setosa </td> </tr> <tr> <td style="text-align:center;"> 4.7/3.2 </td> <td style="text-align:center;"> 1.3/0.2 </td> <td style="text-align:center;"> setosa </td> </tr> </tbody> </table> ] -- .pull-right-50[ <table class="table table-striped table-hover table-responsive table-condensed" style="width: auto !important; "> <tbody> <tr> <td style="text-align:left;"> Sepal.Length </td> <td style="text-align:center;"> 5.1 </td> <td style="text-align:center;"> 4.9 </td> <td style="text-align:center;"> 4.7 </td> <td style="text-align:center;"> 4.6 </td> </tr> <tr> <td style="text-align:left;"> Sepal.Width </td> <td style="text-align:center;"> 3.5 </td> <td style="text-align:center;"> 3.0 </td> <td style="text-align:center;"> 3.2 </td> <td style="text-align:center;"> 3.1 </td> </tr> <tr> <td style="text-align:left;"> Petal.Length </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> 1.3 </td> <td style="text-align:center;"> 1.5 </td> </tr> <tr> <td style="text-align:left;"> Petal.Width </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> 0.2 </td> <td style="text-align:center;"> 0.2 </td> </tr> <tr> <td style="text-align:left;"> Species </td> <td style="text-align:center;"> setosa </td> <td style="text-align:center;"> setosa </td> <td style="text-align:center;"> setosa </td> <td style="text-align:center;"> setosa </td> </tr> </tbody> </table> ] --- name: tidying_data_gather ## Tidying Data with `gather` If some of your column names are actually values of a variable, use `gather`: ```r bijou2 %>% head(n = 5) ``` ``` ## # A tibble: 5 x 3 ## cut `2008` `2009` ## <ord> <int> <dbl> ## 1 Ideal 326 332. ## 2 Premium 326 332. ## 3 Good 327 333. ## 4 Premium 334 340. ## 5 Good 335 341. ``` ```r bijou2 %>% gather(`2008`, `2009`, key = 'year', value = 'price') %>% head(n = 5) ``` ``` ## # A tibble: 5 x 3 ## cut year price ## <ord> <chr> <dbl> ## 1 Ideal 2008 326 ## 2 Premium 2008 326 ## 3 Good 2008 327 ## 4 Premium 2008 334 ## 5 Good 2008 335 ``` --- name: tidying_data_spread ## Tidying Data with `spread` If some of your observations are scattered across many rows, use `gather`: ```r bijou3 ``` ``` ## # A tibble: 9 x 5 ## cut price clarity dimension measurement ## <ord> <int> <ord> <chr> <dbl> ## 1 Ideal 326 SI2 x 3.95 ## 2 Premium 326 SI1 x 3.89 ## 3 Good 327 VS1 x 4.05 ## 4 Ideal 326 SI2 y 3.98 ## 5 Premium 326 SI1 y 3.84 ## 6 Good 327 VS1 y 4.07 ## 7 Ideal 326 SI2 z 2.43 ## 8 Premium 326 SI1 z 2.31 ## 9 Good 327 VS1 z 2.31 ``` ```r bijou3 %>% spread(key=dimension, value=measurement) %>% head(n = 5) ``` ``` ## # A tibble: 3 x 6 ## cut price clarity x y z ## <ord> <int> <ord> <dbl> <dbl> <dbl> ## 1 Good 327 VS1 4.05 4.07 2.31 ## 2 Premium 326 SI1 3.89 3.84 2.31 ## 3 Ideal 326 SI2 3.95 3.98 2.43 ``` --- name: tidying_data_separate ## Tidying Data with `separate` If some of your columns contain more than one value, use `separate`: ```r bijou4 ``` ``` ## # A tibble: 5 x 4 ## cut price clarity dim ## <ord> <int> <ord> <chr> ## 1 Ideal 326 SI2 3.95/3.98/2.43 ## 2 Premium 326 SI1 3.89/3.84/2.31 ## 3 Good 327 VS1 4.05/4.07/2.31 ## 4 Premium 334 VS2 4.2/4.23/2.63 ## 5 Good 335 SI2 4.34/4.35/2.75 ``` ```r bijou4 %>% separate(dim, into = c("x", "y", "z"), sep = "/", convert = T) ``` ``` ## # A tibble: 5 x 6 ## cut price clarity x y z ## <ord> <int> <ord> <dbl> <dbl> <dbl> ## 1 Ideal 326 SI2 3.95 3.98 2.43 ## 2 Premium 326 SI1 3.89 3.84 2.31 ## 3 Good 327 VS1 4.05 4.07 2.31 ## 4 Premium 334 VS2 4.2 4.23 2.63 ## 5 Good 335 SI2 4.34 4.35 2.75 ``` --- name: tidying_data_separate ## Tidying Data with `unite` If some of your columns contain more than one value, use `separate`: ```r bijou5 ``` ``` ## # A tibble: 5 x 7 ## cut price clarity_prefix clarity_suffix x y z ## <ord> <int> <chr> <chr> <dbl> <dbl> <dbl> ## 1 Ideal 326 SI 2 3.95 3.98 2.43 ## 2 Premium 326 SI 1 3.89 3.84 2.31 ## 3 Good 327 VS 1 4.05 4.07 2.31 ## 4 Premium 334 VS 2 4.2 4.23 2.63 ## 5 Good 335 SI 2 4.34 4.35 2.75 ``` ```r bijou5 %>% unite(clarity, clarity_prefix, clarity_suffix, sep='') ``` ``` ## # A tibble: 5 x 6 ## cut price clarity x y z ## <ord> <int> <chr> <dbl> <dbl> <dbl> ## 1 Ideal 326 SI2 3.95 3.98 2.43 ## 2 Premium 326 SI1 3.89 3.84 2.31 ## 3 Good 327 VS1 4.05 4.07 2.31 ## 4 Premium 334 VS2 4.2 4.23 2.63 ## 5 Good 335 SI2 4.34 4.35 2.75 ``` **Note:** that `sep` is here interpreted as the position to split on. It can also be a *regular expression* or a delimiting string/character. Pretty flexible approach! --- name: missing_complete ## Completing Missing Values Using `complete` ```r bijou %>% head(n = 10) %>% select(cut, clarity, price) %>% mutate(continent = sample(c('AusOce', 'Eur'), size = 10, replace = T)) -> missing_stones ``` ```r missing_stones %>% complete(cut, continent) ``` ``` ## # A tibble: 13 x 4 ## cut continent clarity price ## <ord> <chr> <ord> <int> ## 1 Fair AusOce <NA> NA ## 2 Fair Eur VS2 337 ## 3 Good AusOce VS1 327 ## 4 Good AusOce SI2 335 ## 5 Good Eur <NA> NA ## 6 Very Good AusOce VVS2 336 ## 7 Very Good AusOce VVS1 336 ## 8 Very Good AusOce SI1 337 ## 9 Very Good Eur VS1 338 ## 10 Premium AusOce SI1 326 ## 11 Premium Eur VS2 334 ## 12 Ideal AusOce SI2 326 ## 13 Ideal Eur <NA> NA ``` --- name: joins ## Combining Datasets Often, we need to combine a number of data tables (relational data) to get the full picture of the data. Here different types of *joins* come to help: * *mutating joins* that add new variables to data table `A` based on matching observations (rows) from data table `B`, * *filtering joins* that filter observations from data table `A` based on whether they match observations in data table `B`, * *set operations* that treat observations in `A` and `B` as elements of a set. -- Let us create two example tibbles that share a key: .pull-left-50[ ```r A <- tribble( ~key, ~x, 'a', 'A1', 'b', 'A2', 'c', 'A3', 'e','A4' ) ``` <table> <thead> <tr> <th style="text-align:left;"> key </th> <th style="text-align:left;"> x </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> a </td> <td style="text-align:left;"> A1 </td> </tr> <tr> <td style="text-align:left;"> b </td> <td style="text-align:left;"> A2 </td> </tr> <tr> <td style="text-align:left;"> c </td> <td style="text-align:left;"> A3 </td> </tr> <tr> <td style="text-align:left;"> e </td> <td style="text-align:left;"> A4 </td> </tr> </tbody> </table> ] .pull-right-50[ ```r B <- tribble( ~key, ~y, 'a', 'B1', 'b', NA, 'c', 'B3', 'd','B4' ) ``` <table> <thead> <tr> <th style="text-align:left;"> key </th> <th style="text-align:left;"> y </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> a </td> <td style="text-align:left;"> B1 </td> </tr> <tr> <td style="text-align:left;"> b </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:left;"> c </td> <td style="text-align:left;"> B3 </td> </tr> <tr> <td style="text-align:left;"> d </td> <td style="text-align:left;"> B4 </td> </tr> </tbody> </table> ] --- name: inner_join ## The Joins Family .pull-left-50[ The `inner join`: ```r A %>% inner_join(B, by = 'key') # All non-matching rows are dropped! ``` ``` ## # A tibble: 3 x 3 ## key x y ## <chr> <chr> <chr> ## 1 a A1 B1 ## 2 b A2 <NA> ## 3 c A3 B3 ``` ] -- .pull-right-50[ The `left_join`: ```r A %>% left_join(B, by = 'key') ``` ``` ## # A tibble: 4 x 3 ## key x y ## <chr> <chr> <chr> ## 1 a A1 B1 ## 2 b A2 <NA> ## 3 c A3 B3 ## 4 e A4 <NA> ``` ] -- <br> .pull-left-50[ The `right_join`: ```r A %>% right_join(B, by = 'key') ``` ``` ## # A tibble: 4 x 3 ## key x y ## <chr> <chr> <chr> ## 1 a A1 B1 ## 2 b A2 <NA> ## 3 c A3 B3 ## 4 d <NA> B4 ``` ] -- .pull-right-50[ The `full_join`: ```r A %>% full_join(B, by = 'key') ``` ``` ## # A tibble: 5 x 3 ## key x y ## <chr> <chr> <chr> ## 1 a A1 B1 ## 2 b A2 <NA> ## 3 c A3 B3 ## 4 e A4 <NA> ## 5 d <NA> B4 ``` ] --- name: more_tidyverse ## Some Other Friends * `stringr` for string manipulation and regular expressions, * `forcats` for working with factors, * `lubridate` for working with dates. --- name: end-slide class: end-slide # Thank you --- name: purrr_map ## Using `map` Functions from `purrr` Base R `apply` functions have their `tidyverse` counterparts. ```r cars <- as_tibble(mtcars) cars %>% head(n = 5) ``` ``` ## # A tibble: 5 x 11 ## mpg cyl disp hp drat wt qsec vs am gear carb ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 ## 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 ## 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 ## 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 ## 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 ``` The `map` function: ```r cars %>% select(disp, hp) %>% map(mean) ``` ``` ## $disp ## [1] 230.7219 ## ## $hp ## [1] 146.6875 ``` --- name: purrr_map_family ## Different Members of the `map` Family * `map()` -- returns a list, * `map_lgl()` -- returns a logical vector, * `map_int()` -- returns a vector of integers, * `map_dbl()` -- returns a vector of doubles, * `map_chr()` -- returns a vector of characters. --- name: purrr_shortcut_anonymous ## Anonymous Functions in `purrr` .pull-left-50[ base-R ```r models <- cars %>% split(.$cyl) %>% map(function(dat) lm(mpg ~ wt, data = dat)) ``` Now, make summary for each model: ```r models %>% map(summary) %>% map_dbl(~.$r.squared) ``` ``` ## 4 6 8 ## 0.5086326 0.4645102 0.4229655 ``` ] .pull-right-50[ `purrr` ```r models <- cars %>% split(.$cyl) %>% map(~lm(mpg ~ wt, data = .)) ``` Now, make summary for each model using even simpler syntax: ```r models %>% map(summary) %>% map_dbl("r.squared") ``` ``` ## 4 6 8 ## 0.5086326 0.4645102 0.4229655 ``` ] --- name: purrr_safely ## Possibly Quiet and Safe How to deal with errors in `purrr`: * `safely()` -- result is a list with 2 elements: .pull-left-50[ + `result` contains NULL if error occured, the result otherwise, ```r safe_sqrt <- safely(sqrt) safe_sqrt(4) %>% str() ``` ``` ## List of 2 ## $ result: num 2 ## $ error : NULL ``` ] .pull-right-50[ + `error` contains NULL if no error occured, error object otherwise ```r safe_sqrt <- safely(sqrt) safe_sqrt('zebra') %>% str() ``` ``` ## List of 2 ## $ result: NULL ## $ error :List of 2 ## ..$ message: chr "non-numeric argument to mathematical function" ## ..$ call : language .Primitive("sqrt")(x) ## ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition" ``` ] --- name: purrr_possibly ## Possibly Quiet and Safe cted. * `possibly()` let's you define what value to return upon error: ```r tst <- list(4, 7, 'test') tst %>% map_dbl(possibly(sqrt, NA_real_)) ``` ``` ## [1] 2.000000 2.645751 NA ``` -- * `quietly()` -- it captures output, messages and warnings and returns it as a list: ```r x <- list(-1, 1) x %>% map(quietly(log)) %>% str() ``` ``` ## List of 2 ## $ :List of 4 ## ..$ result : num NaN ## ..$ output : chr "" ## ..$ warnings: chr "NaNs produced" ## ..$ messages: chr(0) ## $ :List of 4 ## ..$ result : num 0 ## ..$ output : chr "" ## ..$ warnings: chr(0) ## ..$ messages: chr(0) ``` --- name: more_on_map ## More on `map` Functions What if one wants to map over more than one argument? ```r means <- c(22, 32, 42) std_devs <- c(2.5, 5, 10) my_rnorms <- map2(means, std_devs, rnorm, n = 100) ``` -- <img src="tidyverse_presentation_files/figure-html/unnamed-chunk-55-1.png" style="display: block; margin: auto;" /> -- ```r my_rnorms %>% setNames(LETTERS[1:length(means)]) %>% as_tibble() %>% gather(LETTERS[1]:LETTERS[length(means)], key='run', value='num') %>% ggplot(mapping = aes(x = num)) + geom_density() + facet_grid(~ run) + theme_bw() ``` --- name: more_on_map ## Mapping with Even More Arguments ```r param_sets <- tribble( ~mean, ~sd, ~n, 22, 2.5, 50, 32, 5, 100, 42, 10, 250 ) ``` -- <table> <thead> <tr> <th style="text-align:right;"> mean </th> <th style="text-align:right;"> sd </th> <th style="text-align:right;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 2.5 </td> <td style="text-align:right;"> 50 </td> </tr> <tr> <td style="text-align:right;"> 32 </td> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 100 </td> </tr> <tr> <td style="text-align:right;"> 42 </td> <td style="text-align:right;"> 10.0 </td> <td style="text-align:right;"> 250 </td> </tr> </tbody> </table> -- ```r param_sets %>% pmap(rnorm) %>% str() ``` ``` ## List of 3 ## $ : num [1:50] 21.4 21.2 23.2 22.3 24 ... ## $ : num [1:100] 38.3 27.6 32.8 40.1 30.2 ... ## $ : num [1:250] 41 43 32.9 48.1 35.9 ... ``` --- name: invoke_map ## Invoking Different Functions ```r param_sets <- tribble( ~f, ~params, "runif", list(min = -1, max = 1), "rnorm", list(mean = 32, sd = 2), "rpois", list(lambda = 10) ) result <- param_sets %>% mutate(call_result = invoke_map(f, params, n = 100)) ``` -- <img src="tidyverse_presentation_files/figure-html/unnamed-chunk-60-1.png" style="display: block; margin: auto;" /> --- name: walk ## Let's Take a `walk` to the Printing House What if you want to map a function for its side-effects? ```r list(runif(10), rnorm(10)) %>% walk(print) %>% map(`*`,5) ``` ``` ## [1] 0.4606656 0.9514453 0.8717355 0.2559456 0.9140694 0.5627143 0.9219289 ## [8] 0.2239606 0.2163161 0.9222574 ## [1] -0.63771629 -0.11583061 1.51152238 -1.40222191 0.50117579 ## [6] 0.66319232 -0.01953877 -1.37352133 -0.24149150 0.31510760 ## [[1]] ## [1] 2.303328 4.757227 4.358677 1.279728 4.570347 2.813571 4.609645 ## [8] 1.119803 1.081581 4.611287 ## ## [[2]] ## [1] -3.18858145 -0.57915306 7.55761191 -7.01110957 2.50587894 ## [6] 3.31596158 -0.09769385 -6.86760666 -1.20745751 1.57553802 ``` * `walk2()` * `pwalk()` --- name: some_every ## Predicate Functions * keep all elements fulfilling a condition: ```r iris %>% keep(is.factor) %>% str() ``` ``` ## 'data.frame': 150 obs. of 1 variable: ## $ Species: Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... ``` * discard all elements fulfilling a condition: ```r iris$Petal.Length %>% discard(~ . >= 2) %>% str() ``` ``` ## num [1:50] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ``` * `some()` and `every()` * `detect()` and `detect_index()` ```r 10:1 %>% detect(~ . > 5) 10:1 %>% detect_index(~ . > 5) ``` ``` ## [1] 10 ## [1] 1 ``` * `head_while()`, `tail_while()` --- name: more_tidyverse ## Some Other Friends * `stringr` for string manipulation and regular expressions, * `forcats` for working with factors, * `lubridate` for working with dates. --- name: end-slide class: end-slide # Thank you --- name: report ## Session * This presentation was created in RStudio using [`remarkjs`](https://github.com/gnab/remark) framework through R package [`xaringan`](https://github.com/yihui/xaringan). * For R Markdown, see <http://rmarkdown.rstudio.com> * For R Markdown presentations, see <https://rmarkdown.rstudio.com/lesson-11.html> ```r R.version ``` ``` ## _ ## platform x86_64-apple-darwin15.6.0 ## arch x86_64 ## os darwin15.6.0 ## system x86_64, darwin15.6.0 ## status ## major 3 ## minor 5.0 ## year 2018 ## month 04 ## day 23 ## svn rev 74626 ## language R ## version.string R version 3.5.0 (2018-04-23) ## nickname Joy in Playing ```