Skip to content

Feature Request: Add Support for Pydantic/Pydantic-AI Output Models in Python API #367

@shreyashankar

Description

@shreyashankar

Summary

Support structured output validation and transformation by integrating DocETL’s Python API with pydantic BaseModel output schemas.

Motivation

Validating and post-processing LLM outputs is much more robust with explicit schemas, type-checking, and business logic enforcement. Pydantic models (and pydantic-ai’s output models) make it easy to define constraints, enforce allowed values, and encapsulate custom logic—all in Python code. This greatly improves reliability, debuggability, and ease of maintenance for downstream users.

Proposal

  • Allow users to pass a pydantic BaseMode as an output schema when defining DocETL tasks via the Python API.
  • DocETL should parse the LLM output and validate/transform it using the provided model.
  • On validation failure, users should be able to access detailed error messages or trigger fallback logic.
  • Support advanced pydantic features like custom validators, default values, and business logic methods.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions