class: center, middle, inverse, title-slide # Reticulate ## RaukR 2022 • Advanced R for Bioinformatics ###
Nina Norgren
### NBIS, SciLifeLab --- exclude: true count: false <link href="https://fonts.googleapis.com/css?family=Roboto|Source+Sans+Pro:300,400,600|Ubuntu+Mono&subset=latin-ext" rel="stylesheet"> <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.3.1/css/all.css" integrity="sha384-mzrmE5qonljUremFsqc01SB46JvROS7bZs3IO2EmfFsd15uHvIt+Y8vEf7N7fWAU" crossorigin="anonymous"> <!-- ----------------- Only edit title & author above this ----------------- --> --- name: contents ## Contents * [Learning outcomes](#LOs) * [Introduction](#intro) * [Introducing reticulate](#reticulate) * [Importing modules](#importing_modules) * [Sourcing scripts](#sourcing_scripts) * [Executing Python code](#execute_code) * [Python in R Markdown](#python_rmarkdown) * [Type conversions](#type_conversions) * [Examples](#examples) --- name: LOs ## Learning outcomes <br> In this session we will learn to: * Understand the concepts needed for running Python in R * Understand the different object classes in Python and their equivalent in R * Apply our knowledge to: + Import Python functions into R + Use R objects as input to Python functions + Translate between Python and R objects --- name: intro ## Introduction <br><br> .largest.center[**R versus Python**] ###.center[The ultimate fight!] -- name: intro2 <br><br><br> ###.center[Not anymore!] --- name:reticulate ## Introducing reticulate * Combine Python and R code * Use R classes in Python functions and vice versa * Import Python functions into R code and run from R * Add Python code chunks to markdown documents ```r library(reticulate) ``` --- name:importing_modules ## Importing Python modules ```r datetime <- import("datetime") todays_r_date <- datetime$datetime$now() ``` -- ```r todays_r_date class(todays_r_date) ``` ``` ## [1] "2022-06-13 19:12:35 UTC" ## [1] "POSIXct" "POSIXt" ``` -- Objects are automatically converted to R types, unless otherwise specified -- ```r datetime <- import("datetime", convert = FALSE) todays_py_date <- datetime$datetime$now() ``` -- ```r todays_py_date class(todays_py_date) ``` ``` ## datetime.datetime(2022, 6, 13, 21, 12, 35, 109047) ## [1] "datetime.datetime" "datetime.date" "python.builtin.object" ``` --- ## Importing built-in Python functions Access Python's built-in functions directly in R ```r builtins <- import_builtins() r_vec <- c(1, 5, 3, 4, 2, 2, 3, 2) str(r_vec) ``` ``` ## num [1:8] 1 5 3 4 2 2 3 2 ``` r_vec is an R object. -- ```r builtins$len(r_vec) builtins$max(r_vec) ``` ``` ## [1] 8 ## [1] 5 ``` Python built-in functions still working on R objects -- ```r max(r_vec) ``` ``` ## [1] 5 ``` Normal R way --- name: sourcing_scripts ## Sourcing scripts Import your own python functions for use in R. File `python_functions.py`: ```python def add(x, y): return x + y ``` -- R code: ```r source_python("python_functions.py") class(4) res <- add(4,5) res class(res) ``` ``` ## [1] "numeric" ## [1] 9 ## [1] "numeric" ``` -- Type `numeric` in and type `numeric` out. But what happens in between? --- ## Sourcing scripts But what happens in between? File `python_functions.py`: ```python def add_with_print(x, y): print(x, 'is of the python type ', type(x)) return x + y ``` ```r res2 <- add_with_print(4,5) py_capture_output(add_with_print(4,5)) str(res2) ``` ``` ## [1] "4.0 is of the python type <class 'float'>\n\n" ## num 9 ``` --- name: execute_code ## Execute Python code Run python string: ```r py_run_string("result = [1,2,3]*2") py$result ``` ``` ## [1] 1 2 3 1 2 3 ``` All objects created by python are accessible using the `py` object exported by reticulate --- ## Execute Python code Run python script `my_python_script.py`: ```python def add(x, y): return x + y def multiply_by_3(x): return x*3 def run_all(): x = 5 y = 8 added = add(x, y) final = multiply_by_3(added) return final final = run_all() ``` ```r py_run_file("my_python_script.py") py$final ``` ``` ## [1] 39 ``` --- name: python_rmarkdown ## Python in R Markdown In R Markdown it is possible to mix in Python chunks: ```` ```{python} import pandas as pd movies = get_all_movies() print(type(movies)) ``` ```` ``` ## <class 'pandas.core.frame.DataFrame'> ``` --- ## Python in R Markdown Access the movie object using the `py` object, which will convert movies to an R object: ```r movies_r <- py$movies movies_r <- as_tibble(movies_r) subset <- movies_r %>% select(5:6, 8:10) ``` --- ## Python in R Markdown Access the movie object using the `py` object, which will convert movies to an R object: ```r movies_r <- py$movies movies_r <- as_tibble(movies_r) subset <- movies_r %>% select(5:6, 8:10) knitr::kable(subset[1:7,],'html') ``` <table> <thead> <tr> <th style="text-align:left;"> originalTitle </th> <th style="text-align:left;"> startYear </th> <th style="text-align:right;"> runtimeMinutes </th> <th style="text-align:left;"> genres </th> <th style="text-align:right;"> averageRating </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Kate & Leopold </td> <td style="text-align:left;"> 2001 </td> <td style="text-align:right;"> 118 </td> <td style="text-align:left;"> Comedy,Fantasy,Romance </td> <td style="text-align:right;"> 6.4 </td> </tr> <tr> <td style="text-align:left;"> The Brain That Wouldn't Die </td> <td style="text-align:left;"> 1962 </td> <td style="text-align:right;"> 82 </td> <td style="text-align:left;"> Horror,Sci-Fi </td> <td style="text-align:right;"> 4.4 </td> </tr> <tr> <td style="text-align:left;"> The Fugitive Kind </td> <td style="text-align:left;"> 1960 </td> <td style="text-align:right;"> 119 </td> <td style="text-align:left;"> Drama,Romance </td> <td style="text-align:right;"> 7.1 </td> </tr> <tr> <td style="text-align:left;"> Les yeux sans visage </td> <td style="text-align:left;"> 1960 </td> <td style="text-align:right;"> 90 </td> <td style="text-align:left;"> Drama,Horror </td> <td style="text-align:right;"> 7.7 </td> </tr> <tr> <td style="text-align:left;"> À bout de souffle </td> <td style="text-align:left;"> 1960 </td> <td style="text-align:right;"> 90 </td> <td style="text-align:left;"> Crime,Drama </td> <td style="text-align:right;"> 7.8 </td> </tr> <tr> <td style="text-align:left;"> 13 Ghosts </td> <td style="text-align:left;"> 1960 </td> <td style="text-align:right;"> 85 </td> <td style="text-align:left;"> Horror,Mystery </td> <td style="text-align:right;"> 6.1 </td> </tr> <tr> <td style="text-align:left;"> The Alamo </td> <td style="text-align:left;"> 1960 </td> <td style="text-align:right;"> 162 </td> <td style="text-align:left;"> Adventure,Drama,History </td> <td style="text-align:right;"> 6.8 </td> </tr> </tbody> </table> --- ## Python in R Markdown Continue working with the now converted R object in R ```r ggplot(movies_r, aes(x=startYear)) + geom_bar() + theme(axis.text.x = element_text(angle = 90)) + ggtitle('Number of movies per year') ``` --- ## Python in R Markdown Continue working with the now converted R object in R ```r ggplot(movies_r, aes(x=startYear)) + geom_bar() + theme(axis.text.x = element_text(angle = 90)) + ggtitle('Number of movies per year') ``` <img src="presentation_reticulate_files/figure-html/unnamed-chunk-23-1.png" width="576" style="display: block; margin: auto auto auto 0;" /> --- name: type_conversions ## Type conversions When calling python code from R, R data types are converted to Python types, and vice versa, when values are returned from Python to R they are converted back to R types. ####.center[**Conversion table**] <table> <thead> <tr> <th style="text-align:left;"> R </th> <th style="text-align:left;"> Python </th> <th style="text-align:left;"> Examples </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Single-element vector </td> <td style="text-align:left;"> Scalar </td> <td style="text-align:left;"> 1 , 1L , TRUE, foo </td> </tr> <tr> <td style="text-align:left;"> Multi-element vector </td> <td style="text-align:left;"> List </td> <td style="text-align:left;"> c(1.0, 2.0, 3.0), c(1L, 2L, 3L) </td> </tr> <tr> <td style="text-align:left;"> List of multiple types </td> <td style="text-align:left;"> Tuple </td> <td style="text-align:left;"> list(1L, TRUE, "foo") </td> </tr> <tr> <td style="text-align:left;"> Named list </td> <td style="text-align:left;"> Dict </td> <td style="text-align:left;"> list(a = 1L, b = 2.0), dict(x = x_data) </td> </tr> <tr> <td style="text-align:left;"> Matrix/Array </td> <td style="text-align:left;"> NumPy ndarray </td> <td style="text-align:left;"> matrix(c(1,2,3,4), nrow=2, ncol=2) </td> </tr> <tr> <td style="text-align:left;"> Data Frame </td> <td style="text-align:left;"> Pandas DataFrame </td> <td style="text-align:left;"> data.frame(x = c(1,2,3), y = c("a","b","c")) </td> </tr> <tr> <td style="text-align:left;"> Function </td> <td style="text-align:left;"> Python function </td> <td style="text-align:left;"> function(x) x +1 </td> </tr> <tr> <td style="text-align:left;"> Raw </td> <td style="text-align:left;"> Python bytearray </td> <td style="text-align:left;"> as.raw(c(1:10)) </td> </tr> <tr> <td style="text-align:left;"> NULL, TRUE, FALSE </td> <td style="text-align:left;"> None, True, False </td> <td style="text-align:left;"> NULL, TRUE, FALSE </td> </tr> </tbody> </table> --- ## Type conversions `python_functions.py`: ```python def check_python_type(x): print(type(x)) return x ``` -- ```r source_python("python_functions.py") r_var <- matrix(c(1,2,3,4),nrow=2, ncol=2) class(r_var) py_capture_output(check_python_type(r_var)) r_var2 <- check_python_type(r_var) class(r_var2) ``` ``` ## [1] "matrix" "array" ## [1] "<class 'numpy.ndarray'>\n\n" ## [1] "matrix" "array" ``` --- ## Type conversions ```r source_python("python_functions.py", convert=FALSE) r_var <- matrix(c(1,2,3,4),nrow=2, ncol=2) class(r_var) py_capture_output(check_python_type(r_var)) r_var2 <- check_python_type(r_var) class(r_var2) r_var3 <- py_to_r(r_var2) class(r_var3) ``` ``` ## [1] "matrix" "array" ## [1] "<class 'numpy.ndarray'>\n\n" ## [1] "numpy.ndarray" "python.builtin.object" ## [1] "matrix" "array" ``` --- ## Type conversions - `42` in R is a floating point number. In Python it is an integer ```r str(42) check_python_type(42) py_capture_output(check_python_type(42)) ``` ``` ## num 42 ## 42.0 ## [1] "<class 'float'>\n\n" ``` -- ```r str(42L) check_python_type(42L) py_capture_output(check_python_type(42L)) ``` ``` ## int 42 ## 42 ## [1] "<class 'int'>\n\n" ``` --- ## Type conversions - List conversions of single element vectors, automatically translated to Python scalar ```r str(c(24)) check_python_type(c(24)) py_capture_output(check_python_type(c(24))) ``` ``` ## num 24 ## 24.0 ## [1] "<class 'float'>\n\n" ``` -- ```r str(list(24)) check_python_type(list(24)) py_capture_output(check_python_type(list(24))) ``` ``` ## List of 1 ## $ : num 24 ## [24.0] ## [1] "<class 'list'>\n\n" ``` <!-- --------------------- Do not edit this and below --------------------- --> --- name: end-slide class: end-slide, middle count: false # Thank you. Questions? <p>R version 4.1.0 (2021-05-18)<br><p>Platform: x86_64-w64-mingw32/x64 (64-bit)</p><p>OS: Windows 10 x64 (build 19044)</p><br> Built on : <i class='fa fa-calendar' aria-hidden='true'></i> 13-Jun-2022 at <i class='fa fa-clock-o' aria-hidden='true'></i> 21:12:39 <b>2022</b> • [SciLifeLab](https://www.scilifelab.se/) • [NBIS](https://nbis.se/) • [RaukR](https://nbisweden.github.io/workshop-RaukR-2206/)