RaukR 2025 • R Beyond the Basics
Nina Norgren
09-Jun-2025
In this session we will learn to:
R versus Python The ultimate fight!
Not anymore!
Objects are automatically converted to R types, unless otherwise specified
Access Python’s built-in functions directly in R
num [1:8] 1 5 3 4 2 2 3 2
r_vec
is an R object.
Python built-in functions still working on R objects
Import your own python functions for use in R. File python_functions.py
:
Import your own python functions for use in R.
R code:
Import your own python functions for use in R.
R code:
Type numeric
in and type numeric
out. But what happens in between?
But what happens in between?
File python_functions.py
:
Run python string:
All objects created by python are accessible using the py
object exported by reticulate
Run python script my_python_script.py
:
In R Markdown, it is possible to mix in Python chunks:
<class 'pandas.core.frame.DataFrame'>
Access the movie object using the py
object, which will convert movies to an R object:
Access the movie object using the py
object, which will convert movies to an R object:
movies_r <- py$movies
movies_r <- as_tibble(movies_r)
subset <- movies_r %>% select(5:6, 8:10)
knitr::kable(subset[1:7,],'html')
originalTitle | startYear | runtimeMinutes | genres | averageRating |
---|---|---|---|---|
Kate & Leopold | 2001 | 118 | Comedy,Fantasy,Romance | 6.4 |
The Brain That Wouldn't Die | 1962 | 82 | Horror,Sci-Fi | 4.4 |
The Fugitive Kind | 1960 | 119 | Drama,Romance | 7.1 |
Les yeux sans visage | 1960 | 90 | Drama,Horror | 7.7 |
À bout de souffle | 1960 | 90 | Crime,Drama | 7.8 |
13 Ghosts | 1960 | 85 | Horror,Mystery | 6.1 |
The Alamo | 1960 | 162 | Adventure,Drama,History | 6.8 |
Continue working with the now converted R object in R
Continue working with the now converted R object in R
When calling python code from R, R data types are converted to Python types, and vice versa, when values are returned from Python to R they are converted back to R types.
R | Python | Examples |
---|---|---|
Single-element vector | Scalar | 1 , 1L , TRUE, foo |
Multi-element vector | List | c(1.0, 2.0, 3.0), c(1L, 2L, 3L) |
List of multiple types | Tuple | list(1L, TRUE, "foo") |
Named list | Dict | list(a = 1L, b = 2.0), dict(x = x_data) |
Matrix/Array | NumPy ndarray | matrix(c(1,2,3,4), nrow=2, ncol=2) |
Data Frame | Pandas DataFrame | data.frame(x = c(1,2,3), y = c("a","b","c")) |
Function | Python function | function(x) x +1 |
Raw | Python bytearray | as.raw(c(1:10)) |
NULL, TRUE, FALSE | None, True, False | NULL, TRUE, FALSE |
python_functions.py
:
source_python("python_functions.py", convert=FALSE)
r_var <- matrix(c(1,2,3,4),nrow=2, ncol=2)
class(r_var)
r_var2 <- check_python_type(r_var)
class(r_var2)
r_var3 <- py_to_r(r_var2)
class(r_var3)
[1] "matrix" "array"
<class 'numpy.ndarray'>
[1] "numpy.ndarray" "python.builtin.object"
[1] "matrix" "array"
42
in R is a floating point number. In Python it is an integer# Import scikit-learn's random forest classifier
sklearn <- import("sklearn.ensemble")
RandomForestClassifier <- sklearn$RandomForestClassifier
# Create a random forest classifier
clf <- RandomForestClassifier(n_estimators=100L)
# Training data (example)
X_train <- matrix(runif(1000), ncol=10)
y_train <- sample(c(0, 1), 100, replace=TRUE)
# Train the model
clf$fit(X_train, y_train)
# Predict on new data
X_test <- matrix(runif(200), ncol=10)
predictions <- clf$predict(X_test)
predictions
RandomForestClassifier()
[1] 1 1 1 1 1 0 1 0 1 1 0 1 0 0 1 0 1 0 1 1
# Load the ensembl_rest library
ensembl_rest <- import("ensembl_rest")
# Fetch gene information for a given gene ID
gene_info <- ensembl_rest$symbol_lookup(species='homo sapiens', symbol='BRCA2')
# Print gene information
gene_info$description
[1] "BRCA2 DNA repair associated [Source:HGNC Symbol;Acc:HGNC:1101]"
# Import Biopython's SeqIO module
SeqIO <- import("Bio.SeqIO")
# Parse a FASTA file
records <- SeqIO$parse("example.fasta", "fasta")
# Translate each sequence to a protein
translated_proteins <- list()
for (record in reticulate::iterate(records)) {
translated_proteins[[record$id]] <- record$seq$translate()
}
translated_proteins
$GeneA
Seq('MAIVMGR*KGAR*')
$GeneB
Seq('MRMT*LTSIVAS*')