Skip to content

Caching #16

@pat-s

Description

@pat-s

After dealing with the idea of caching in mlr recently, I think this is an important topic for mlr3.
It would be a core base feature and should be integrated right from the start.

While in mlr I just did it for caching filter values for now, we should think of implementing it as a pkg option and make it available for all calls (resample, train, tuning, filtering, etc).

Most calls (dataset, learner, hyperpars) are unique and caching won't have that much of an effect as for filtering (for which the call to generate the filter values is always the same and the subsetting happens afterwards).

However, it can also have a positive effect on "normal" train/test calls:

  • If a run (resample, tuneParams, benchmark) errors and a seed is set, the user can just rerun and profit from the cached calls
  • For tuning methods like grid search settings might be redundant more often and the user can profit from caching
  • Most often it will apply for simple train/test calls without tuning.

I've added a function delete_cache() and get_cache_dir() in my mlr PR to make the cache handling more convenient. We could think about a own class cache for such things.

Please share your opinions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions