When a date column is used in its native format, it is usually converted by an R model to an integer.
It can be re-engineered as:
Days since a reference date
Day of the week
Month
Year
Indicators for holidays
General definitions
Data preprocessing steps allow your model to fit.
Feature engineering steps help the model do the least work to predict the outcome as well as possible.
The recipes package can handle both!
In a little bit, we’ll see successful (and unsuccessful) feature engineering methods for our example data.
Prepare your data for modeling
The recipes package is an extensible framework for pipeable sequences of feature engineering steps that provide preprocessing tools to be applied to data.
Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets.
The resulting processed output can be used as inputs for statistical or machine learning models.
A first recipe
cell_rec <-recipe(class ~ ., data = cell_tr)
The recipe() function assigns columns to roles of “outcome” or “predictor” using the formula
Typically, you will want to use a workflow to estimate and apply a recipe.
If you have an error and need to debug your recipe, the original recipe object (e.g. pca_rec) can be estimated manually with a function called prep(). It is analogous to fit(). See TMwR section 16.4
Another function (bake()) is analogous to predict(), and gives you the processed data back.
The tidy() function can be used to get specific results from the recipe.
More on recipes
Once fit() is called on a workflow, changing the model does not re-fit the recipe.