Skip to content

Support for multi type (Unions) in schemas and validation #1152

@vianmixtkz

Description

@vianmixtkz

Is your feature request related to a problem? Please describe.

I would like pandera to support Union Type. That is the validation of a Series/Column should allow multiple types.
Pydantic allows it.

Here an example of my issue

from typing import Union
import pandas as pd
import pandera as pa
from pandera.typing import Series

class InputSchema(pa.DataFrameModel):
    year: Series[int] = pa.Field(gt=2000, coerce=True)
    month: Series[int] = pa.Field(ge=1, le=12, coerce=True)
    day: Series[int] = pa.Field(ge=0, le=365, coerce=True)
    comment : Series[Union[str, float]] = pa.Field()

class OutputSchema(InputSchema):
    revenue: Series[float]

df = pd.DataFrame({
    "year": ["2001", "2002", "2003"],
    "month": ["3", "6", "12"],
    "day": ["200", "156", "365"],
    "comment":["test", float("nan"), "test"]
})

InputSchema(df) # raises TypeError Cannot interpret 'typing.Union[str, float]' as a data type

Describe the solution you'd like

I think it is the desired behavior for now to not allow Unions. But could you consider an option to allow it in the future ?

Describe alternatives you've considered

Split the Union columns into multiple columns, one for each type but this is not really something that I can control. Cf next section.

Additional context

I have a valid use case for this. I am using pandas to handle CSVs where some columns contain hybrid data types.
I am using pandas for the preprocessing and pydantic for the validation, and I would like to use pandera to make this process (processing + validation) more robust

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions