Skip to content

Conversation

jjmachan
Copy link
Member

Summary

Primary Motivation: This PR fixes a fundamental inheritance pattern issue in the metrics system where factory-created metrics (via @discrete_metric, @numeric_metric, etc.) and class-instantiated metrics (via DiscreteMetric(), NumericMetric(), etc.) should have different base classes but were incorrectly sharing the same inheritance hierarchy.

The Problem:

  • Factory-created metrics should inherit from SimpleBaseMetric (lightweight, decorator-based)
  • Class-instantiated metrics should inherit from SimpleLLMMetric (LLM-enabled, full-featured)
  • Previously, both paths incorrectly inherited from the same base classes, creating confusion and incorrect behavior

The Solution:
Separated base classes: Created SimpleBaseMetric (for factory) and SimpleLLMMetric (for class instantiation) as distinct, unrelated base classes
Removed llm_based.py: Consolidated BaseLLMMetric and LLMMetric into base.py as SimpleBaseMetric and SimpleLLMMetric
Fixed decorator inheritance: Factory methods now create metrics that inherit from SimpleBaseMetric + ValidatorMixin only
Fixed class inheritance: Class-based metrics like DiscreteMetric now inherit from SimpleLLMMetric + ValidatorMixin
Added validator system: Introduced modular validation mixins that work with both inheritance patterns
Maintained backward compatibility: Added aliases BaseMetric = SimpleBaseMetric and LLMMetric = SimpleLLMMetric

Exact Steps Taken:

  1. 7d6de2a - Updated gitignore for experimental directories
  2. c6101f8 - Renamed classes and established proper naming convention
  3. 46450d8 - Refactored decorator and class-based inheritance patterns
  4. a464c37 - Simplified validator system with proper mixins
  5. fe996f6 - Removed llm_based.py after consolidation

Test plan

  • Verify factory-created metrics (@discrete_metric) inherit from SimpleBaseMetric only
  • Verify class-instantiated metrics (DiscreteMetric()) inherit from SimpleLLMMetric
  • Test that both patterns work correctly with their respective validation mixins
  • Ensure backward compatibility with existing metric imports
  • Validate all metric functionality (scoring, async operations, alignment)
  • Run full test suite to ensure no regressions

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Sep 26, 2025
@jjmachan jjmachan requested a review from anistark September 26, 2025 16:31
@jjmachan jjmachan changed the title Fix metric inheritance patterns: separate factory-created metrics from class-instantiated metrics fix: metric inheritance patterns: separate factory-created metrics from class-instantiated metrics Sep 27, 2025
Copy link
Contributor

@anistark anistark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

@jjmachan jjmachan merged commit 80e77b6 into main Sep 30, 2025
8 checks passed
@jjmachan jjmachan deleted the fix/refactor branch September 30, 2025 01:20
anistark pushed a commit that referenced this pull request Oct 1, 2025
…rics (#2320)

## Summary

This PR adds persistence capabilities and better string representations
for LLM-based metrics, making them easier to save, share, and debug.

## Changes

### 1. Save/Load Functionality
- Added `save()` and `load()` methods to `SimpleLLMMetric` and its
subclasses (`DiscreteMetric`, `NumericMetric`, `RankingMetric`)
- Supports JSON format with optional gzip compression
- Handles all prompt types including `Prompt` and `DynamicFewShotPrompt`
- Smart defaults: `metric.save()` saves to `./metric_name.json`

### 2. Improved `__repr__` Methods
- Clean, informative string representations for both LLM-based and
decorator-based metrics
- Removed implementation details (memory addresses, `<locals>`, internal
attributes)
- Smart prompt truncation (80 chars max)
- Function signature display for decorator-based metrics

**Before:**
```python
create_metric_decorator.<locals>.decorator_factory.<locals>.decorator.<locals>.CustomMetric(name='summary_accuracy', _func=<function summary_accuracy at 0x151ffdf80>, ...)
```

**After:**
```python
# LLM-based metrics
DiscreteMetric(name='response_quality', allowed_values=['correct', 'incorrect'], prompt='Evaluate if the response...')

# Decorator-based metrics  
summary_accuracy(user_input, response) -> DiscreteMetric[['pass', 'fail']]
```

### 3. Response Model Handling
- Added `create_auto_response_model()` factory to mark auto-generated
models
- Only warns about custom response models during save, not standard ones

## Usage Examples

```python
# Save metric with default path
metric.save()  # → ./response_quality.json

# Save with custom path
metric.save("custom.json")
metric.save("/path/to/metrics/")  # → /path/to/metrics/response_quality.json
metric.save("compressed.json.gz")  # Compressed

# Load metric
loaded_metric = DiscreteMetric.load("response_quality.json")

# For DynamicFewShotPrompt metrics
loaded_metric = DiscreteMetric.load("metric.json", embedding_model=embeddings)
```

## Testing
- Comprehensive test suite with 8 tests covering all save/load scenarios
- Tests for default paths, directory handling, compression
- Tests for all prompt types and metric subclasses

## Dependencies
**Note:** This PR builds on #2316 (Fix metric inheritance patterns) and
requires it to be merged first. The changes here depend on the
cleaned-up metric inheritance structure from that PR.

## Checklist
- [x] Tests added
- [x] Documentation in docstrings
- [x] Backwards compatible (new functionality only)
- [x] Follows TDD practices
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants