Skip to content

Interface for generalizable approach to building prediction intervals #464

@brshallo

Description

@brshallo

Would love if tidymodels (in {parsnip} or more likely in a separate pacakge) had support for more generalizable approaches to buiding prediction intervals across choice of model specification (and that, preferably, were less reliant on assumptions compared to parametric approaches).

Reprex for how could look...

library(tidymodels)

### Set-up workflow ###

set.seed(123)
iris <- as_tibble(iris)
split <- initial_split(iris)
train <- training(split)
new_data <- testing(split)

rec <- recipe(Sepal.Length ~ ., data = train)

mod <- parsnip::decision_tree() %>% 
  set_engine("rpart") %>% 
  set_mode("regression")

workflow <- workflows::workflow() %>% 
  add_recipe(rec) %>% 
  add_model(mod) 

### Set-up simulation for predictive inference ###

devtools::source_gist("https://gist.github.com/brshallo/3db2cd25172899f91b196a90d5980690")

# output for a 95% prediction interval
workflow %>% 
  prep_interval(train) %>% 
  predict_interval(new_data, probs = c(0.025, 0.975))

My post on Simulating Prediction Intervals walks through the steps above more explicity. Is just a rough set-up -- could make prep_interval() capable of taking-in tailored resampling structures or other specifications relevant to how the prediction intervals should be generated.

The approaches described in the field of Conformal Inference are relevant (e.g. ryantibs/conformal ). #41 is also tangentially related.

(This issue stems from Rstudio Community thread and Mara's encouragement to open an issue to move discussion to github.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions