-
-
Notifications
You must be signed in to change notification settings - Fork 361
Bugfix/1446: Ensure Pydantic Models Can Be Created withtyping.pyspark.DataFrame or typing.pyspark_sql.DataFrame Generic
#1447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Brayan Jaramillo <[email protected]>
Signed-off-by: Brayan Jaramillo <[email protected]>
* Disable irrelevant pylint warnings Signed-off-by: Brayan Jaramillo <[email protected]>
Signed-off-by: Brayan Jaramillo <[email protected]>
|
Thanks for the PR @brayan07! Looks like there are some lint and unit test errors. Be sure to run tests and setup pre-commit in your dev env to make sure those are passing. |
Signed-off-by: Brayan Jaramillo <[email protected]>
|
Still running into issues with tests unrelated to new code locally. Will try to resolve before pushing again. Thanks! |
|
I'm getting the same failed tests locally for the |
|
Hi @brayan07 sorry for the delayed review on this! I believe the test errors are coming from |
In this PR we resolve the issue reported in #1446, where any Pydantic model with a
pandera.typing.pyspark.DataFrameorpandera.typing.pyspark_sql.DataFramealways throws a confusingValidationError.For clarity, we want to make sure the following leads to the expected behavior:
We do this by creating a
_PydanticIntegrationMixInthat can be used by bothpandera.typing.pyspark_sql.DataFrameandpandera.typing.pyspark.DataFrame.The content of the mixin is a variation of the methods used in
pandera.typing.pandas.DataFrame.Note:
We assume that any pyspark dataframe used in a pydantic model will be validated eagerly for both pyspark.pandas and pyspark_sql. The default behavior for pyspark_sql dataframes is normally lazy validation, but this makes less sense to me when using a Pydantic model.