-
Notifications
You must be signed in to change notification settings - Fork 105
Description
Would love if tidymodels (in {parsnip} or more likely in a separate pacakge) had support for more generalizable approaches to buiding prediction intervals across choice of model specification (and that, preferably, were less reliant on assumptions compared to parametric approaches).
Reprex for how could look...
library(tidymodels)
### Set-up workflow ###
set.seed(123)
iris <- as_tibble(iris)
split <- initial_split(iris)
train <- training(split)
new_data <- testing(split)
rec <- recipe(Sepal.Length ~ ., data = train)
mod <- parsnip::decision_tree() %>%
set_engine("rpart") %>%
set_mode("regression")
workflow <- workflows::workflow() %>%
add_recipe(rec) %>%
add_model(mod)
### Set-up simulation for predictive inference ###
devtools::source_gist("https://gist.github.com/brshallo/3db2cd25172899f91b196a90d5980690")
# output for a 95% prediction interval
workflow %>%
prep_interval(train) %>%
predict_interval(new_data, probs = c(0.025, 0.975))
My post on Simulating Prediction Intervals walks through the steps above more explicity. Is just a rough set-up -- could make prep_interval() capable of taking-in tailored resampling structures or other specifications relevant to how the prediction intervals should be generated.
The approaches described in the field of Conformal Inference are relevant (e.g. ryantibs/conformal ). #41 is also tangentially related.
(This issue stems from Rstudio Community thread and Mara's encouragement to open an issue to move discussion to github.)