1 Introduction

Throughout the course, we have seen steps that are common in machine learning workflows, such as data cleaning, feature selection, data splitting, model training, tuning, and evaluation.
It is valuable to know how to code each step manually, using basic functions or selected R packages. The advantage of this approach is that it gives a deep understanding of the process and full control over each step.
An alternative approach is to follow a structured pipeline using established frameworks. The advantages here include faster setup, easier experimentation with different algorithms, and better collaboration. A structured pipeline also reduces the risk of data leakage and model overfitting.
To help streamline this process, several frameworks have been developed in R, like caret package and lately the tidymodels framework.
While tidymodels is the most widely used and tidyverse-friendly ML framework in R, other modern options exist. For instance mlr3 offers a highly modular, object-oriented design suited for advanced tasks like benchmarking and custom pipelines. For deep learning, torch and its high-level interface luz bring native PyTorch support to R.
Here, we will see how to build a predictive models, including all the steps, to predict BMI based on the features from the diabetes dataset. We will try to code things ourselves and see how to put everything together using tidymodels.

1.1 Tidymodels

One of the earlier initiatives to create a framwork for ML tasks in R was the caret package, led by Max Kuhn, which unified many modeling tools and provided support for preprocessing, resampling, and parameter tuning. Caretwas an early and widely-used framework that provided tools for preprocessing, resampling, and cross-validation.
Building on this foundation, Kuhn partnered with Hadley Wickham, the creator of the tidyverse, to introduce the tidymodels ecosystem in 2020: a modern, modular collection of R packages that applies tidyverse principles to make machine learning workflows more intuitive, readable, and consistent.

Some of the core packages under `tidymodels` framework https://www.tidymodels.org
core package	function
	provides infrastructure for efficient data splitting and resampling
	parsnip is a tidy, unified interface to models that can be used to try a range of models without getting bogged down in the syntactical minutiae of the underlying packages
	recipes is a tidy interface to data pre-processing tools for feature engineering
	workflows bundle your pre-processing, modeling, and post-processing together
	tune helps you optimize the hyperparameters of your model and pre-processing steps
	yardstick measures the effectiveness of models using performance metrics