Skip to content

Conversation

jjmachan
Copy link
Member

@jjmachan jjmachan commented Sep 23, 2025

Summary

This PR enhances the metric decorator system (discrete_metric, numeric_metric, ranking_metric) with improved input validation, better error messages, and support for both plain value returns and MetricResult objects.

Key Changes

1. Automatic MetricResult Wrapping

  • Metric functions can now return plain values (strings, floats, lists) which are automatically wrapped in MetricResult objects
  • Functions can still return MetricResult objects directly for cases where custom reasons are needed

2. Enhanced Input Validation with Pydantic

  • Added Pydantic-based validation for metric function parameters
  • Provides clear, user-friendly error messages for:
    • Type mismatches
    • Missing required parameters
    • Unknown parameters (warnings)
    • Positional arguments (with helpful correction suggestions)

3. Improved Async/Sync Handling

  • Simplified async execution using existing ragas.async_utils.run
  • Proper detection and handling of event loops
  • Both sync and async functions work seamlessly

4. Direct Callable Support

  • Decorated metrics can be called directly like the original function
  • metric() returns the raw function result
  • metric.score() returns a MetricResult object with validation

5. Better MetricResult Representation

  • Improved __repr__ to show both value and reason when present
  • Cleaner string representation for debugging

Testing

  • Comprehensive test suite added (test_metric_decorators.py) covering:
    • Plain value returns vs MetricResult returns
    • Sync and async functions
    • Input validation edge cases
    • Direct callable functionality
    • All three metric types (discrete, numeric, ranking)

Breaking Changes

None - the changes are backward compatible. Existing code using MetricResult returns will continue to work.

Benefits

  • Better Developer Experience: Clear error messages guide users to correct usage
  • More Flexible: Functions can return simple values without wrapping in MetricResult
  • Type Safety: Pydantic validation ensures type correctness at runtime
  • Cleaner Code: Users can write simpler metric functions when custom reasons aren't needed

🤖 Generated with Claude Code

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Sep 23, 2025
@jjmachan jjmachan changed the title cleanup discrete metric creation fix: improve metric decorators with better validation and error handling Sep 23, 2025
@jjmachan jjmachan requested a review from anistark September 23, 2025 02:42
Copy link
Contributor

@anistark anistark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

However, this might break the custom llm based metrics. validation might break.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Sep 23, 2025
Copy link

openhands-ai bot commented Sep 23, 2025

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • CI

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #2302 at branch `fix/new-metrics`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@jjmachan jjmachan merged commit 4678acf into main Sep 23, 2025
8 checks passed
@jjmachan jjmachan deleted the fix/new-metrics branch September 23, 2025 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants