-
Notifications
You must be signed in to change notification settings - Fork 90
Open
Description
Describe the task
Refactor the statistical validation code in data/src/validation/
to consistently and efficiently use Pandera's DataFrameSchema
for statistical range validation. Currently, the implementation is inconsistent across different validators, with some using Pandera's schema validation while others implement custom statistical checks, leading to code duplication and reduced maintainability. This task involves standardizing all output schema tests to leverage Pandera's built-in functionality and best practices, eliminating redundant statistical validation code by either utilizing existing Pandera features or consolidating repetitive logic into the base validator class.
Acceptance Criteria
- Audit all validation files in
data/src/validation/
to identify inconsistent statistical validation patterns - Refactor statistical range validations to consistently use Pandera's
DataFrameSchema
with appropriate statistical checks - Replace custom statistical validation logic with Pandera's built-in statistical validation methods where possible
- Consolidate repetitive statistical validation code into the base validator class or reusable utility functions
- Update validation schemas to use consistent patterns and naming conventions
- Ensure validation performance is maintained or improved after refactoring
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
No status