feat: Add save/load functionality and improved repr for LLM-based metrics #2320

jjmachan · 2025-09-27T03:41:07Z

Summary

This PR adds persistence capabilities and better string representations for LLM-based metrics, making them easier to save, share, and debug.

Changes

1. Save/Load Functionality

Added save() and load() methods to SimpleLLMMetric and its subclasses (DiscreteMetric, NumericMetric, RankingMetric)
Supports JSON format with optional gzip compression
Handles all prompt types including Prompt and DynamicFewShotPrompt
Smart defaults: metric.save() saves to ./metric_name.json

2. Improved `repr` Methods

Clean, informative string representations for both LLM-based and decorator-based metrics
Removed implementation details (memory addresses, <locals>, internal attributes)
Smart prompt truncation (80 chars max)
Function signature display for decorator-based metrics

Before:

create_metric_decorator.<locals>.decorator_factory.<locals>.decorator.<locals>.CustomMetric(name='summary_accuracy', _func=<function summary_accuracy at 0x151ffdf80>, ...)

After:

# LLM-based metrics
DiscreteMetric(name='response_quality', allowed_values=['correct', 'incorrect'], prompt='Evaluate if the response...')

# Decorator-based metrics  
summary_accuracy(user_input, response) -> DiscreteMetric[['pass', 'fail']]

3. Response Model Handling

Added create_auto_response_model() factory to mark auto-generated models
Only warns about custom response models during save, not standard ones

Usage Examples

# Save metric with default path
metric.save()  # → ./response_quality.json

# Save with custom path
metric.save("custom.json")
metric.save("/path/to/metrics/")  # → /path/to/metrics/response_quality.json
metric.save("compressed.json.gz")  # Compressed

# Load metric
loaded_metric = DiscreteMetric.load("response_quality.json")

# For DynamicFewShotPrompt metrics
loaded_metric = DiscreteMetric.load("metric.json", embedding_model=embeddings)

Testing

Comprehensive test suite with 8 tests covering all save/load scenarios
Tests for default paths, directory handling, compression
Tests for all prompt types and metric subclasses

Dependencies

Note: This PR builds on #2316 (Fix metric inheritance patterns) and requires it to be merged first. The changes here depend on the cleaned-up metric inheritance structure from that PR.

Checklist

Tests added
Documentation in docstrings
Backwards compatible (new functionality only)
Follows TDD practices

anistark

Looks good overall. 2 minor suggestions added.

And need to fix the merge conflict.

src/ragas/metrics/discrete.py

anistark · 2025-09-27T09:01:56Z

src/ragas/metrics/base.py

+    prompt: t.Optional[t.Union[str, "Prompt"]] = None
+    _response_model: t.Type["BaseModel"] = field(init=False)
+
+    def __post_init__(self):


Some metrics can be created without prompts but will fail during save.

Either we can make prompts required here or allow None in serialization..

good catch - SimpleLLMMetric will always have prompt so I'll make it required

openhands-ai · 2025-09-30T01:56:20Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- CI

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #2320 at branch `feat/save-llm-based-metric`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

jjmachan

I've address everything - could you take a look and merge it if you think everything looks good?

jjmachan · 2025-09-30T01:33:19Z

src/ragas/metrics/base.py

+    prompt: t.Optional[t.Union[str, "Prompt"]] = None
+    _response_model: t.Type["BaseModel"] = field(init=False)
+
+    def __post_init__(self):


good catch - SimpleLLMMetric will always have prompt so I'll make it required

src/ragas/metrics/discrete.py

Follow up on #2320

jjmachan added 8 commits September 25, 2025 10:53

ignore somethings in examples

7d6de2a

renamed

c6101f8

refactored decorator and classbased

46450d8

simpler validator

a464c37

removed llm_based.py

fe996f6

a slash command for claude to create the PR

9f72944

initial save/load implemented

c7961dc

added proper repr to the metrics

f0e56a9

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Sep 27, 2025

jjmachan changed the title ~~Feat/save llm based metric~~ feat: Add save/load functionality and improved repr for LLM-based metrics Sep 27, 2025

anistark approved these changes Sep 27, 2025

View reviewed changes

jjmachan added 2 commits September 29, 2025 18:23

merged with main

4b13a6d

Merge branch 'main' into feat/save-llm-based-metric

c5f1f3d

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Sep 30, 2025

jjmachan added 2 commits September 29, 2025 18:46

added validation to check metric is loading the same type of metric

f132c72

simple check when loading the allowed_values for numerical_metric

2e111ea

made typing a bit more tighter

060247b

jjmachan commented Sep 30, 2025

View reviewed changes

anistark approved these changes Oct 1, 2025

View reviewed changes

anistark merged commit 7458687 into main Oct 1, 2025
12 checks passed

anistark deleted the feat/save-llm-based-metric branch October 1, 2025 06:53

anistark mentioned this pull request Oct 1, 2025

refactor: docs and warnings for metric base new structure #2333

Merged

jjmachan pushed a commit that referenced this pull request Oct 1, 2025

refactor: docs and warnings for metric base new structure (#2333)

07655ff

Follow up on #2320

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add save/load functionality and improved repr for LLM-based metrics #2320

feat: Add save/load functionality and improved repr for LLM-based metrics #2320

Uh oh!

jjmachan commented Sep 27, 2025 •

edited

Loading

Uh oh!

anistark left a comment

Uh oh!

Uh oh!

anistark Sep 27, 2025

Uh oh!

jjmachan Sep 30, 2025

Uh oh!

openhands-ai bot commented Sep 30, 2025

Uh oh!

jjmachan left a comment

Uh oh!

jjmachan Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add save/load functionality and improved repr for LLM-based metrics #2320

feat: Add save/load functionality and improved repr for LLM-based metrics #2320

Uh oh!

Conversation

jjmachan commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Save/Load Functionality

2. Improved __repr__ Methods

3. Response Model Handling

Usage Examples

Testing

Dependencies

Checklist

Uh oh!

anistark left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

anistark Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

jjmachan Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot commented Sep 30, 2025

Uh oh!

jjmachan left a comment

Choose a reason for hiding this comment

Uh oh!

jjmachan Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jjmachan commented Sep 27, 2025 •

edited

Loading

2. Improved `repr` Methods