Caching doesn't respect temperature sampling

## Problem

The current caching implementation causes issues when running propensity evaluations that rely on temperature sampling. After, e.g. running 30 samples (same input prompt) that each have a different response because of temperature sampling, only one of the responses is cached (the last I think). This means that when you rerun the evaluation, only one cached response is loaded instead of the intended diverse set of sampled outputs, which messes up the scoring completely.

This is a very annoying barrier for propensity evaluations since they typically sample many outputs for the same input to measure propensity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Caching doesn't respect temperature sampling #2580

Problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Caching doesn't respect temperature sampling #2580

Description

Problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions