- 
                Notifications
    You must be signed in to change notification settings 
- Fork 75
Description
First of all, this tutorial is amazing. Content, pace, level of detail. I love it.
I encountered one issue with random forest when following the code of Chapter 1 locally. I'm on tidymodels 0.1.2 and randomForest 4.6-14 running in Windows.
While I found a solution by mutating chr cols in car_vars to factor, I have no idea why the code that works on netlify did not work locally.
Running predict(fit_rf, car_train) returned:
Error in predict.randomForest(object = object$fit, newdata = new_data) : 
New factor levels not present in the training data
To reproduce:
install.packages(c("tidymodels","randomForest"))
library(tidymodels)
csv_url <- "https://raw.githubusercontent.com/juliasilge/supervised-ML-case-studies-course/master/data/cars2018.csv"
download.file(csv_url,"cars.csv")
cars <- readr::read_csv("cars.csv")
set.seed(1234)
car_vars <- cars %>%
  select(-model, -model_index)
car_split <- car_vars %>%
  initial_split(prop = 0.8,
                strata = aspiration)
car_train <- training(car_split)
rf_mod <- rand_forest() %>%
  set_mode("regression") %>%
  set_engine("randomForest")
fit_rf <- rf_mod %>%
  fit(log(mpg) ~ ., 
      data = car_train) 
results <- car_train %>%
  mutate(mpg = log(mpg)) %>%
  bind_cols(predict(fit_rf, car_train) %>%
              rename(.pred_rf = .pred))
What I noticed is that all levels in str(fit_rf[["fit"]][["forest"]][["xlevels"]]) were numeric (contrary to the model stored in data/c1_fit_rf.rds. Maybe someone here could explain me why, since randomForest is new to me?
Solution to the error was to enforce factor class on chrs: car_vars <- mutate(car_vars, across(where(is.character),as.factor))