Bugfix/1446: Ensure Pydantic Models Can Be Created with`typing.pyspark.DataFrame` or `typing.pyspark_sql.DataFrame` Generic #1447

brayan07 · 2023-12-15T15:00:16Z

In this PR we resolve the issue reported in #1446, where any Pydantic model with a pandera.typing.pyspark.DataFrame or pandera.typing.pyspark_sql.DataFrame always throws a confusing ValidationError.

For clarity, we want to make sure the following leads to the expected behavior:

import pyspark.sql.types as T

from pandera.pyspark import DataFrameModel, Field
from pandera.typing.pyspark_sql import DataFrame
from pydantic import BaseModel
from pyspark.sql import SparkSession


class SampleSchema(DataFrameModel):
    """
    Sample schema model with data checks.
    """

    product: T.StringType() = Field()
    price: T.IntegerType() = Field()


class PydanticContainer(BaseModel):
    """
    Pydantic container with a DataFrameModel as a field.
    """

    data: DataFrame[SampleSchema]

    class Config:
        arbitrary_types_allowed = True

We do this by creating a _PydanticIntegrationMixIn that can be used by both pandera.typing.pyspark_sql.DataFrame and pandera.typing.pyspark.DataFrame.

The content of the mixin is a variation of the methods used in pandera.typing.pandas.DataFrame.

Note:
We assume that any pyspark dataframe used in a pydantic model will be validated eagerly for both pyspark.pandas and pyspark_sql. The default behavior for pyspark_sql dataframes is normally lazy validation, but this makes less sense to me when using a Pydantic model.

Signed-off-by: Brayan Jaramillo <[email protected]>

* Disable irrelevant pylint warnings Signed-off-by: Brayan Jaramillo <[email protected]>

Signed-off-by: Brayan Jaramillo <[email protected]>

cosmicBboy · 2023-12-18T16:25:19Z

Thanks for the PR @brayan07! Looks like there are some lint and unit test errors. Be sure to run tests and setup pre-commit in your dev env to make sure those are passing.

Signed-off-by: Brayan Jaramillo <[email protected]>

brayan07 · 2023-12-19T15:08:31Z

Still running into issues with tests unrelated to new code locally. Will try to resolve before pushing again. Thanks!

brayan07 · 2023-12-19T15:48:07Z

I'm getting the same failed tests locally for the main branch, as well as for this branch, with make nox-conda. I don't think it's what I added but something in the dev setup. Would it be alright if we ran the CI workflow one more time to help me debug?

cosmicBboy · 2024-04-13T15:57:35Z

Hi @brayan07 sorry for the delayed review on this!

I believe the test errors are coming from from pydantic import GetCoreSchemaHandler. Will need to move that import into the PYDANTIC_V2 conditional

brayan07 added 2 commits December 12, 2023 10:28

Solve problem for both pysparksql and pyspark typing

dc1eabc

Signed-off-by: Brayan Jaramillo <[email protected]>

Implement Pydantic Integration MixIn

2e633c6

Signed-off-by: Brayan Jaramillo <[email protected]>

brayan07 mentioned this pull request Dec 15, 2023

Cannot create a pydantic model with a pandera.typing.pyspark.DataFrame type. #1446

Open

3 tasks

brayan07 added 2 commits December 15, 2023 10:24

Move mix in outside of if statement

d391ea9

* Disable irrelevant pylint warnings Signed-off-by: Brayan Jaramillo <[email protected]>

Update module docstrings

6d97c23

Signed-off-by: Brayan Jaramillo <[email protected]>

brayan07 added 2 commits December 19, 2023 09:55

Fix linting issues

e3cb340

Signed-off-by: Brayan Jaramillo <[email protected]>

Merge branch 'main' into bugfix/1446

cadb145

cosmicBboy closed this Jan 25, 2024

cosmicBboy reopened this Jan 25, 2024

cosmicBboy added 2 commits April 14, 2024 11:46

update pydantic imports

dd8a0d7

Merge branch 'main' into bugfix/1446

2d1a3b2

ELC mentioned this pull request May 24, 2025

fix(ci): add Pydantic V1 to tests in CI - Fixes #1446 #2006

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bugfix/1446: Ensure Pydantic Models Can Be Created with`typing.pyspark.DataFrame` or `typing.pyspark_sql.DataFrame` Generic #1447

Bugfix/1446: Ensure Pydantic Models Can Be Created with`typing.pyspark.DataFrame` or `typing.pyspark_sql.DataFrame` Generic #1447

Uh oh!

brayan07 commented Dec 15, 2023 •

edited

Loading

Uh oh!

cosmicBboy commented Dec 18, 2023

Uh oh!

brayan07 commented Dec 19, 2023 •

edited

Loading

Uh oh!

brayan07 commented Dec 19, 2023

Uh oh!

cosmicBboy commented Apr 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Bugfix/1446: Ensure Pydantic Models Can Be Created withtyping.pyspark.DataFrame or typing.pyspark_sql.DataFrame Generic #1447

Are you sure you want to change the base?

Bugfix/1446: Ensure Pydantic Models Can Be Created withtyping.pyspark.DataFrame or typing.pyspark_sql.DataFrame Generic #1447

Uh oh!

Conversation

brayan07 commented Dec 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cosmicBboy commented Dec 18, 2023

Uh oh!

brayan07 commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brayan07 commented Dec 19, 2023

Uh oh!

cosmicBboy commented Apr 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bugfix/1446: Ensure Pydantic Models Can Be Created with`typing.pyspark.DataFrame` or `typing.pyspark_sql.DataFrame` Generic #1447

Bugfix/1446: Ensure Pydantic Models Can Be Created with`typing.pyspark.DataFrame` or `typing.pyspark_sql.DataFrame` Generic #1447

brayan07 commented Dec 15, 2023 •

edited

Loading

brayan07 commented Dec 19, 2023 •

edited

Loading