-
-
Notifications
You must be signed in to change notification settings - Fork 361
Description
Is your feature request related to a problem? Please describe.
I would like pandera to support Union Type. That is the validation of a Series/Column should allow multiple types.
Pydantic allows it.
Here an example of my issue
from typing import Union
import pandas as pd
import pandera as pa
from pandera.typing import Series
class InputSchema(pa.DataFrameModel):
year: Series[int] = pa.Field(gt=2000, coerce=True)
month: Series[int] = pa.Field(ge=1, le=12, coerce=True)
day: Series[int] = pa.Field(ge=0, le=365, coerce=True)
comment : Series[Union[str, float]] = pa.Field()
class OutputSchema(InputSchema):
revenue: Series[float]
df = pd.DataFrame({
"year": ["2001", "2002", "2003"],
"month": ["3", "6", "12"],
"day": ["200", "156", "365"],
"comment":["test", float("nan"), "test"]
})
InputSchema(df) # raises TypeError Cannot interpret 'typing.Union[str, float]' as a data type
Describe the solution you'd like
I think it is the desired behavior for now to not allow Unions. But could you consider an option to allow it in the future ?
Describe alternatives you've considered
Split the Union columns into multiple columns, one for each type but this is not really something that I can control. Cf next section.
Additional context
I have a valid use case for this. I am using pandas to handle CSVs where some columns contain hybrid data types.
I am using pandas for the preprocessing and pydantic for the validation, and I would like to use pandera to make this process (processing + validation) more robust