Skip to content

Conversation

brayan07
Copy link

@brayan07 brayan07 commented Dec 15, 2023

In this PR we resolve the issue reported in #1446, where any Pydantic model with a pandera.typing.pyspark.DataFrame or pandera.typing.pyspark_sql.DataFrame always throws a confusing ValidationError.

For clarity, we want to make sure the following leads to the expected behavior:

import pyspark.sql.types as T

from pandera.pyspark import DataFrameModel, Field
from pandera.typing.pyspark_sql import DataFrame
from pydantic import BaseModel
from pyspark.sql import SparkSession


class SampleSchema(DataFrameModel):
    """
    Sample schema model with data checks.
    """

    product: T.StringType() = Field()
    price: T.IntegerType() = Field()


class PydanticContainer(BaseModel):
    """
    Pydantic container with a DataFrameModel as a field.
    """

    data: DataFrame[SampleSchema]

    class Config:
        arbitrary_types_allowed = True

We do this by creating a _PydanticIntegrationMixIn that can be used by both pandera.typing.pyspark_sql.DataFrame and pandera.typing.pyspark.DataFrame.

The content of the mixin is a variation of the methods used in pandera.typing.pandas.DataFrame.

Note:
We assume that any pyspark dataframe used in a pydantic model will be validated eagerly for both pyspark.pandas and pyspark_sql. The default behavior for pyspark_sql dataframes is normally lazy validation, but this makes less sense to me when using a Pydantic model.

* Disable irrelevant pylint warnings

Signed-off-by: Brayan Jaramillo <[email protected]>
Signed-off-by: Brayan Jaramillo <[email protected]>
@cosmicBboy
Copy link
Collaborator

Thanks for the PR @brayan07! Looks like there are some lint and unit test errors. Be sure to run tests and setup pre-commit in your dev env to make sure those are passing.

@brayan07
Copy link
Author

brayan07 commented Dec 19, 2023

Still running into issues with tests unrelated to new code locally. Will try to resolve before pushing again. Thanks!

@brayan07
Copy link
Author

I'm getting the same failed tests locally for the main branch, as well as for this branch, with make nox-conda. I don't think it's what I added but something in the dev setup. Would it be alright if we ran the CI workflow one more time to help me debug?

@cosmicBboy cosmicBboy closed this Jan 25, 2024
@cosmicBboy cosmicBboy reopened this Jan 25, 2024
@cosmicBboy
Copy link
Collaborator

Hi @brayan07 sorry for the delayed review on this!

I believe the test errors are coming from from pydantic import GetCoreSchemaHandler. Will need to move that import into the PYDANTIC_V2 conditional

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants