Throughout the course, we have seen steps that are common in machine learning workflows, such as data cleaning, feature selection, data splitting, model training, tuning, and evaluation.
It is valuable to know how to code each step manually, using basic functions or selected R packages.
An alternative approach is to follow a structured pipeline using established frameworks.
✅ Pros
- Full control over each step
- Deep understanding of the process
- Flexible for non-standard workflows
- Easier to debug and customize
❌ Cons
- More code to write
- Harder to maintain
- Manual error checking
- Less reproducible
✅ Pros
- Faster prototyping
- Cleaner, modular syntax
- Consistent and reproducible pipelines
- Easier collaboration and sharing
❌ Cons
- Less transparency (black-box risk)
- Steeper learning curve at first
- May feel restrictive for custom tasks
caret
package, led by Max Kuhn.caret
(2007) unified many modeling tools and was widely-used framework that provided tools for preprocessing, resampling, and cross-validation.Tidymodels
were launched in 2020, as a modern, modular collection of R packages that applies tidyverse principles to make machine learning workflows more intuitive, readable, and consistent.mlr3
offers a highly modular, object-oriented design suited for advanced tasks like benchmarking and custom pipelines. For deep learning, torch
and its high-level interface luz
bring native PyTorch support to R.Tidymodels is a collection of packages for modeling and statistical analysis in R.
Unified Framework: a suite of packages that share underlying design philosophies designed to streamline ML tasks.
Extensible and Flexible: allows users to easily integrate with other R packages and frameworks; supports a wide range of methods.
Emphasis on Tidy Data Principles: The framework adheres to the principles of “tidy data” set by the tidyverse, ensuring that data manipulation and analysis tasks are approachable and intuitive.
Let’s try to build a predictive model for BMI using our diabetes
data set using basic R approach and/or tidymodels
framework.