Skip to content

feat: Initial Pydantic processor implementation from original PR #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 15, 2025

Conversation

mbeacom
Copy link
Owner

@mbeacom mbeacom commented Jul 15, 2025

This pull request introduces a new package, genai-processors-pydantic, which provides a Pydantic-based JSON validation processor for Google's genai-processors framework. The changes include the implementation of the validator, examples to demonstrate its usage, and comprehensive tests to ensure its reliability. Additionally, the project configuration files have been updated to support the development environment.

New Feature: PydanticValidator Implementation

  • genai_processors_pydantic/__init__.py: Introduced the PydanticValidator processor and ValidationConfig for validating JSON data against Pydantic models. Added versioning (0.1.0) and metadata for the package.
  • README.md: Added detailed documentation about the PydanticValidator, including its motivation, installation, configuration, behavior, and practical usage examples.

Example and Demonstration

  • examples/generic_example.py: Created an example script demonstrating how to use the PydanticValidator to validate JSON data, handle validation results, and route data based on validation status.

Testing and Reliability

Development Environment Setup

  • .pre-commit-config.yaml: Configured pre-commit hooks for code quality checks, including tools like black, ruff, and bandit. This ensures consistent formatting, linting, and security checks.
  • .python-version: Specified Python version 3.13 for the project environment.

@mbeacom mbeacom self-assigned this Jul 15, 2025
@Copilot Copilot AI review requested due to automatic review settings July 15, 2025 18:36
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @mbeacom, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers the initial implementation of a Pydantic-based validation processor for the genai-processors ecosystem. It enables robust data schema enforcement for JSON content flowing through AI pipelines, complete with configurable validation behaviors. The changes also establish a solid foundation for development, including comprehensive testing, detailed documentation, and standardized code quality tools.

Highlights

  • New PydanticValidator Processor: I've introduced a new PydanticValidator processor within the genai-processors framework. This processor is designed to validate JSON data against Pydantic models, ensuring data integrity in AI pipelines. It includes a ValidationConfig for flexible behavior, allowing for strict mode and fail-fast options.
  • Comprehensive Documentation and Examples: Detailed documentation has been added to README.md explaining the motivation, installation, configuration, and usage of the PydanticValidator. An example (examples/generic_example.py) demonstrates how to use the validator, handle validation results, and route data based on outcomes.
  • Robust Test Suite: A comprehensive test suite (genai_processors_pydantic/tests/test_validator.py) has been implemented to cover various scenarios, including successful validation, failed validation in permissive and strict modes, passthrough behavior for non-JSON parts, and error handling for invalid JSON.
  • Development Workflow Enhancements: I've configured pre-commit hooks (.pre-commit-config.yaml) for code formatting (black), linting (ruff), and security checks (bandit) to maintain code quality. The project's Python version has also been updated to 3.13 (.python-version).
  • Project Metadata and Dependencies: The pyproject.toml file has been added to define project metadata, specify dependencies (genai-processors>=1.0.4, pydantic>=2.0), and configure build, testing, and linting tools.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new PydanticValidator processor for validating JSON data against Pydantic models within the genai-processors framework, along with configuration, documentation, examples, and tests. Key changes include:

  • Implementation of PydanticValidator and ValidationConfig with strict and fail-fast options.
  • Comprehensive documentation in README.md and a runnable example in examples/generic_example.py.
  • A robust test suite covering validation modes, error handling, and passthrough behavior.
  • Project metadata updates (pyproject.toml, .python-version, pre-commit hooks) for packaging and workflow.

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pyproject.toml Defined package metadata, dependencies, and tooling configurations
genai_processors_pydantic/validator.py Added PydanticValidator implementation with success/failure handling
genai_processors_pydantic/tests/test_validator.py Added tests for various validation scenarios
genai_processors_pydantic/tests/init.py Initialized the test module
genai_processors_pydantic/init.py Exported PydanticValidator, ValidationConfig, and version
examples/generic_example.py Provided a basic usage example for the validator in a stream
README.md Documented installation, configuration, API, and pipeline usage
.python-version Updated the development Python version to 3.13
.pre-commit-config.yaml Added pre-commit hooks for formatting and linting
Comments suppressed due to low confidence (2)

README.md:27

  • [nitpick] The uv add instruction is unclear or unsupported by most users; consider removing or explaining what uv refers to (e.g., pip install).
Or with uv:

genai_processors_pydantic/validator.py:104

  • The _get_data_to_validate method returns None on parse failure but its return type is annotated as JsonData; update it to JsonData | None for accuracy.
    ) -> JsonData:

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a new PydanticValidator with comprehensive documentation, examples, and tests. The code is well-structured and the inclusion of pre-commit hooks is a solid choice for maintaining quality. The most critical issue is to avoid storing non-serializable Pydantic model instances in metadata to ensure robustness in distributed pipelines. Also consider aligning the Python version with the declared supported versions for better contributor accessibility, and address the issues in the README example and the performance improvement opportunity in the validator logic.

mbeacom added 3 commits July 15, 2025 15:08
…ME with limitations and roadmap

refactor: Improve validation metadata to store serialized data instead of Pydantic instances
fix: Update example scripts for multi-model validation patterns and improve test cases
chore: Clean up .gitignore and pre-commit configuration
@mbeacom
Copy link
Owner Author

mbeacom commented Jul 15, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a Pydantic-based JSON validator for the genai-processors framework. The review focuses on improving the examples to better demonstrate the streaming capabilities of the framework by avoiding buffering results into lists.

@mbeacom mbeacom merged commit 614002c into main Jul 15, 2025
3 checks passed
@mbeacom mbeacom deleted the initial-seed branch July 15, 2025 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant