This is the Tidyverse course work material for Introduction to R Programming for Life Scientists Course, Uppsala, fall 2018.
Welcome to the hands-on workshop “Tidy Work in Tidyverse”. Most of the things necessary to complete the tutorials and challenges were covered in the lecture. However, sometimes the tasks require that you check the docs or search online. Not all our solutions are optimal. Let us know if you can do better or solve things in a different way. If stuck, look at hints, next google and if still stuck, turn to TA. It is a lot of material, do not fee bad if you do not solve all tasks. Good luck!
Rewrite the following code chunk as one pipe (magrittr
):
my_cars <- mtcars[, c(1:4, 7)]
my_cars <- my_cars[my_cars$disp > mean(my_cars$disp), ]
print(my_cars)
my_cars <- colMeans(my_cars)
my_cars <- mtcars %>%
select(c(1:4, 7)) %>%
filter(disp > mean(disp)) %T>%
print() %>%
colMeans()
Rewrite the correlations below using pipes.
cor(mtcars$gear, mtcars$mpg)
mtcars %$% cor(gear, mpg)
cor(mtcars)
mtcars %>% cor()
mtcars
dataset to a tibble vehicles
.cyl
) variable using:
[[index]]
accessor,[[string]]
accessor,$
accessor.vehicles
back to a data.frame
called automobiles
.# 1
vehicles <- mtcars %>% as_tibble()
# 2
vehicles[['cyl']]
vehicles[[2]]
vehicles$cyl
# 3
vehicles %T>%
{print(.[['cyl']])} %T>%
{print(.[[2]])} %>%
.$cyl
# 4
vehicles
# 5
vehicles %>% head(n = 30)
# 6
options(tibble.print_min = 15, tibble.print_max = 30)
# 7
automobiles <- as.data.frame(vehicles)
Do you think tibbles are lazy? Try to create a tibble that tests whether lazy evaluation applies to tibbles too.
tibble(x = sample(1:10, size = 10, replace = T), y = log10(x))
The nycflights13
package contains information about all flights that departed from NYC (i.e., EWR, JFK and LGA) in 2013: 336,776 flights with 16 variables. To help understand what causes delays, it also includes a number of other useful datasets: weather, planes, airports, airlines. We will use it to train working with tibbles and dplyr
.
nycflights13
package (install if necessary),flights
tibble.carrier
and arr_time
,carrier
, tailnum
and origin
,day
through carrier
,arr
ival (hint: ?tidyselect
),v <- c("arr_time", "sched_arr_time", "arr_delay")
,dest
to destination
using:
select()
andrename()
install.packages('nycflights13')
library('nycflights13')
?nycflights13
flights
flights %>% select(-carrier, -arr_time)
flights %>% select(carrier, tailnum, origin)
flights %>% select(-(day:carrier))
flights %>% select(contains('arr_')) # or
v <- c("arr_time", "sched_arr_time", "arr_delay")
flights %>% select(v) # or
flights %>% select(one_of(v))
flights %>% select(destination = dest)
flights %>% rename(destination = dest)
# select keeps only the renamed column while rename returns the whole dataset
# with the column renamed
?slice
),?sample_n()
) 3 random flights per day in March,unique()
routes and sort them by origin,distinct()
routes and sort them by origin,unique()
more efficient than distinct()
?flights %>% filter(arr_delay < 0)
flights %>% filter(dep_delay >= 10, dep_delay <= 33) # or
flights %>% filter(between(dep_delay, 10, 33))
flights %>% filter(is.na(arr_time))
flights %>% slice(1234:1258)
flights %>% filter(month == 3) %>%
group_by(day) %>%
sample_n(3)
flights %>%
filter(month == 1) %>%
group_by(carrier) %>%
top_n(5, dep_delay)
air_time
is the amount of time in minutes spent in the air. Add a new column air_spd
that will contain aircraft’s airspeed in mph,
as above, but keep only the new air_spd
variable,
use rownames_to_column()
on mtcars
to add car model as an extra column,
flights %>% mutate(air_spd = distance/(air_time / 60))
flights %>% transmute(air_spd = distance/(air_time / 60))
mtcars %>% rownames_to_column('model')
group_by()
, summarise()
and n()
to see how many planes were delayed (departure) every month,flights %>%
filter(dep_delay > 0) %>%
group_by(month) %>%
summarise(num_dep_delayed = n())
dep_delay
per month?flights %>%
group_by(month) %>%
summarise(mean_dep_delay = mean(dep_delay, na.rm = T))
flights %>%
filter(arr_delay > 0) %>%
group_by(origin) %>%
summarise(cnt = n()) %>%
arrange(desc(cnt))
summarise()
to sum total dep_delay
per month in hours, flights %>%
group_by(month) %>%
summarize(tot_dep_delay = sum(dep_delay/60, na.rm = T))
group_size()
on carrier
what does it return?flights %>%
group_by(carrier) %>%
group_size()
n_groups()
to check the number of unique origin-carrier pairs,flights %>%
group_by(carrier) %>%
n_groups()
Note on ungroup
Depending on the version of dplyr
you may or may need to use the ungroup()
if you want to group your data on some other variables. In the newer versions, summarise
and mutate
drop one aggregation level.
sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS 10.14
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] magrittr_1.5 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.8
## [5] purrr_0.2.5 readr_1.1.1 tidyr_0.8.2 tibble_1.4.2
## [9] ggplot2_3.1.0 tidyverse_1.2.1 captioner_2.2.3 bookdown_0.7
## [13] knitr_1.20
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.0 cellranger_1.1.0 plyr_1.8.4 compiler_3.5.0
## [5] pillar_1.3.0 bindr_0.1.1 tools_3.5.0 digest_0.6.18
## [9] lubridate_1.7.4 jsonlite_1.5 evaluate_0.12 nlme_3.1-137
## [13] gtable_0.2.0 lattice_0.20-38 pkgconfig_2.0.2 rlang_0.3.0.1
## [17] cli_1.0.1 rstudioapi_0.8 yaml_2.2.0 haven_1.1.2
## [21] xfun_0.4 bindrcpp_0.2.2 withr_2.1.2 xml2_1.2.0
## [25] httr_1.3.1 hms_0.4.2 rprojroot_1.3-2 grid_3.5.0
## [29] tidyselect_0.2.5 glue_1.3.0 R6_2.3.0 readxl_1.1.0
## [33] rmarkdown_1.10 modelr_0.1.2 backports_1.1.2 scales_1.0.0
## [37] htmltools_0.3.6 rvest_0.3.2 assertthat_0.2.0 colorspace_1.3-2
## [41] stringi_1.2.4 lazyeval_0.2.1 munsell_0.5.0 broom_0.5.0
## [45] crayon_1.3.4
Page built on: 14-Nov-2018 at 15:06:50.
2018 | SciLifeLab > NBIS)