You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support structured output validation and transformation by integrating DocETL’s Python API with pydantic BaseModel output schemas.
Motivation
Validating and post-processing LLM outputs is much more robust with explicit schemas, type-checking, and business logic enforcement. Pydantic models (and pydantic-ai’s output models) make it easy to define constraints, enforce allowed values, and encapsulate custom logic—all in Python code. This greatly improves reliability, debuggability, and ease of maintenance for downstream users.
Proposal
Allow users to pass a pydantic BaseMode as an output schema when defining DocETL tasks via the Python API.
DocETL should parse the LLM output and validate/transform it using the provided model.
On validation failure, users should be able to access detailed error messages or trigger fallback logic.
Support advanced pydantic features like custom validators, default values, and business logic methods.