feat: OpenAI batch api #2245

anistark · 2025-09-02T13:09:09Z

Issue Link / Problem Description

Closes Support for Batch API #2136

Changes Made

implement additional batch api
add faithfullness metric for starters
add docs on batch api usage and cutting costs

Testing

How to Test

Automated tests added/updated
Manual testing steps:
1. uv run pytest tests/ -k "batch" -v to test batch functionality.
2. uv run pytest tests/ -k "metric" -k "batch" -v like: uv run pytest tests/ -k "faithfulness" -k "batch" -v
3. OPENAI_API_KEY=your_key uv run pytest tests/integration -k "batch" -v with openai key.

…ass rather dataclass

anistark · 2025-09-24T11:35:06Z

/claude-review

claude · 2025-09-24T11:35:18Z

Claude finished @anistark's task —— View job

🔍 PR Code Review: OpenAI Batch API Implementation

Review Progress:

Analyze PR scope and change type classification
Review core batch API implementation (src/ragas/batch_evaluation.py)
Review LLM base changes and batch API integration (src/ragas/llms/)
Review faithfulness metric batch implementation (src/ragas/metrics/_faithfulness.py)
Review metrics base changes (src/ragas/metrics/base.py)
Review test coverage and quality (tests/unit/llms/)
Review documentation (docs/howtos/customizations/metrics/)
Check configuration changes (pyproject.toml, mkdocs.yml)
Provide overall assessment and recommendations

📋 Change Type Classification

✨ New Feature: This PR adds comprehensive OpenAI Batch API support to Ragas, enabling cost-effective evaluation with up to 50% savings for large-scale evaluations.

🚀 Strengths & Excellent Implementation

✅ Outstanding Architecture Design

Clean separation of concerns with dedicated modules (batch_api.py, batch_evaluation.py)
Comprehensive type hints and proper error handling throughout
Async/sync dual support with proper runtime checks using _check_async_client()
Factory pattern usage for clean instantiation via create_batch_api() and create_batch_evaluator()

✅ Robust Error Handling & Validation

Thorough request validation including duplicate custom_id detection (src/ragas/llms/batch_api.py:256-259)
File size limits and batch size enforcement with clear error messages (src/ragas/llms/batch_api.py:264-267)
Timeout handling with configurable polling intervals and meaningful timeout errors
Comprehensive status tracking with well-defined BatchStatus enum

✅ Intelligent Response Parsing

The score extraction logic in BatchEvaluationResult is particularly well-designed:

Multiple parsing strategies for JSON, structured output, and raw text formats
Graceful degradation when parsing fails with detailed logging
Metric-specific patterns for common score fields (src/ragas/batch_evaluation.py:332-341)
Range validation ensuring scores are in reasonable bounds

✅ Smart Faithfulness Integration

Two-phase batch processing with statement generation followed by NLI verification
Complete orchestration via CompleteFaithfulnessBatchJob class
Proper dependency handling between batch job phases
Error recovery with appropriate fallbacks to NaN values

✅ Comprehensive Documentation

Clear usage examples with practical cost estimation utilities
Step-by-step guides covering sync/async usage patterns
Cost optimization guidance with concrete savings calculations
Well-integrated into existing documentation structure (mkdocs.yml:106)

✅ Strong Test Coverage

Unit tests covering all major components with mock-based testing
Validation scenarios testing error conditions and edge cases
Batch support detection for different LLM types

🔄 Areas for Enhancement

⚠️ Minor Implementation Improvements

Prompt Conversion Simplification (src/ragas/metrics/_faithfulness.py:234-242)
- Current implementation uses simplified ChatPromptTemplate fallback
- Consider integrating with actual PydanticPrompt conversion methods for consistency
Context Utilization
- Current faithfulness batch implementation doesn't fully leverage retrieved_contexts
- Could enhance NLI verification accuracy by using complete context information
Integration Test Gap
- Consider adding optional end-to-end tests with actual OpenAI API
- Large dataset batch splitting scenarios could benefit from integration testing

💡 Suggestions for Future Iterations

Batch Size Optimization: Consider implementing adaptive batch sizing based on estimated token counts
Progress Tracking: Could add progress callbacks for long-running batch jobs
Retry Mechanisms: Enhanced retry logic for failed individual requests within batches
Cost Tracking: Runtime cost calculation based on actual token usage from responses

🎯 Overall Assessment

Verdict: EXCELLENT Implementation ⭐⭐⭐⭐⭐

This PR demonstrates exceptional software engineering practices:

Production-ready code quality with comprehensive error handling and validation
Thoughtful API design balancing ease of use with flexibility
Excellent documentation making the feature immediately usable
Strong test coverage providing confidence in reliability
Smart integration with existing Ragas architecture without breaking changes

Impact & Value

Significant cost savings (up to 50%) for large-scale evaluations
Scalability improvement supporting up to 50K requests per batch
Non-blocking evaluation capabilities for better resource utilization
Maintains Ragas quality standards while adding powerful new functionality

Recommendation: APPROVE ✅

This PR successfully addresses issue #2136 with a high-quality implementation that:

Follows established patterns and conventions
Provides comprehensive batch API support
Includes excellent documentation and examples
Maintains backward compatibility
Offers substantial value to users

The minor suggestions above are enhancements for future iterations rather than blocking issues. The implementation is solid and ready for production use.

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Sep 2, 2025

anistark changed the title ~~Feat/OpenAI batch api~~ Feat: OpenAI batch api Sep 2, 2025

anistark changed the title ~~Feat: OpenAI batch api~~ feat: OpenAI batch api Sep 2, 2025

anistark requested a review from jjmachan September 2, 2025 15:19

anistark added 3 commits September 24, 2025 16:55

fix: type checking

a2d9245

finish extract funtion

010b5fa

fix: pytest attempting to collect TestsetGenerator class as a test cl…

bd05971

…ass rather dataclass

anistark force-pushed the feat/openai-batch-api branch from 1a7eb06 to bd05971 Compare September 24, 2025 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: OpenAI batch api #2245

feat: OpenAI batch api #2245

Uh oh!

anistark commented Sep 2, 2025

Uh oh!

anistark commented Sep 24, 2025

Uh oh!

claude bot commented Sep 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: OpenAI batch api #2245

Are you sure you want to change the base?

feat: OpenAI batch api #2245

Uh oh!

Conversation

anistark commented Sep 2, 2025

Issue Link / Problem Description

Changes Made

Testing

How to Test

Uh oh!

anistark commented Sep 24, 2025

Uh oh!

claude bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 PR Code Review: OpenAI Batch API Implementation

📋 Change Type Classification

🚀 Strengths & Excellent Implementation

✅ Outstanding Architecture Design

✅ Robust Error Handling & Validation

✅ Intelligent Response Parsing

✅ Smart Faithfulness Integration

✅ Comprehensive Documentation

✅ Strong Test Coverage

🔄 Areas for Enhancement

⚠️ Minor Implementation Improvements

💡 Suggestions for Future Iterations

🎯 Overall Assessment

Verdict: EXCELLENT Implementation ⭐⭐⭐⭐⭐

Impact & Value

Recommendation: APPROVE ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Sep 24, 2025 •

edited

Loading