Skip to content

Caching doesn't respect temperature sampling #2580

@Jannoshh

Description

@Jannoshh

Problem

The current caching implementation causes issues when running propensity evaluations that rely on temperature sampling. After, e.g. running 30 samples (same input prompt) that each have a different response because of temperature sampling, only one of the responses is cached (the last I think). This means that when you rerun the evaluation, only one cached response is loaded instead of the intended diverse set of sampled outputs, which messes up the scoring completely.

This is a very annoying barrier for propensity evaluations since they typically sample many outputs for the same input to measure propensity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions