Skip to content

Better names for scorer variations #2531

@kaifronsdal

Description

@kaifronsdal

Currently when you want to create several instances of the same scorer, inspect creates a unique metric name for each of them by appending an index to each one (in metrics_unique_key).

In the example below, the scores will show up as something like model_graded_qa and model_graded_qa2 from which it is rather difficult to back out which model they refer to.

Task(
    dataset=dataset,
    solver=[
        system_message(SYSTEM_MESSAGE),
        generate()
    ],
    scorer=[
        model_graded_qa(model="openai/gpt-4"), 
        model_graded_qa(model="google/gemini-2.5-pro")
    ],
)

Ideally there would be an easy way to dynamically set the scorer's name.

Two proposals for how to do this:

  1. Some way to override the name of a scorer (and maybe other attributes like metrics): scorer_with(model_graded_qa(model="openai/gpt-4"), 'model_graded_qa:gpt-4')
  2. Some way to set the name dynamically within the scorer
@scorer()
def model_graded_qa(model = None):
     async def score...

     return override(score, name=f"model_graded_qa:{model}" if model else "model_graded_qa")

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions