set.seed(853)validation_split(cell_tr, strata = class)#> # Validation Set Split (0.75/0.25) using stratification #> # A tibble: 1 × 2#> splits id #> <list> <chr> #> 1 <split [1211/404]> validation
A validation set is just another type of resample.
This function will not go away but we have a better interface for validation in the next rsample release.
Decision tree 🌳
Random forest 🌳🌲🌴🌵🌴🌳🌳🌴🌲🌵🌴🌲🌳🌴🌳🌵🌵🌴🌲🌲🌳🌴🌳🌴🌲🌴🌵🌴🌲🌴🌵🌲🌵🌴🌲🌳🌴🌵🌳🌴🌳
Random forest 🌳🌲🌴🌵🌳🌳🌴🌲🌵🌴🌳🌵
Ensemble many decision tree models
All the trees vote! 🗳️
Bootstrap aggregating + random predictor sampling
Often works well without tuning hyperparameters (more on this), as long as there are enough trees
Create a random forest model
rf_spec <-rand_forest(trees =1000, mode ="classification")rf_spec#> Random Forest Model Specification (classification)#> #> Main Arguments:#> trees = 1000#> #> Computational engine: ranger
Create a random forest model
rf_wflow <-workflow(class ~ ., rf_spec)rf_wflow#> ══ Workflow ════════════════════════════════════════════════════════════════════#> Preprocessor: Formula#> Model: rand_forest()#> #> ── Preprocessor ────────────────────────────────────────────────────────────────#> class ~ .#> #> ── Model ───────────────────────────────────────────────────────────────────────#> Random Forest Model Specification (classification)#> #> Main Arguments:#> trees = 1000#> #> Computational engine: ranger
Evaluating model performance
ctrl_rs <-control_resamples(save_pred =TRUE)# Random forest uses random numbers so set the seed firstset.seed(2)rf_res <-fit_resamples(rf_wflow, cell_rs, control = ctrl_rs, metrics = cls_metrics)collect_metrics(rf_res)#> # A tibble: 3 × 6#> .metric .estimator mean n std_err .config #> <chr> <chr> <dbl> <int> <dbl> <chr> #> 1 brier_class binary 0.120 10 0.00283 Preprocessor1_Model1#> 2 kap binary 0.625 10 0.0163 Preprocessor1_Model1#> 3 roc_auc binary 0.903 10 0.00495 Preprocessor1_Model1