Skip to content

Task: Refactor Statistical Validation to Use Pandera DataFrameSchema Best Practices #1259

@nlebovits

Description

@nlebovits

Describe the task

Refactor the statistical validation code in data/src/validation/ to consistently and efficiently use Pandera's DataFrameSchema for statistical range validation. Currently, the implementation is inconsistent across different validators, with some using Pandera's schema validation while others implement custom statistical checks, leading to code duplication and reduced maintainability. This task involves standardizing all output schema tests to leverage Pandera's built-in functionality and best practices, eliminating redundant statistical validation code by either utilizing existing Pandera features or consolidating repetitive logic into the base validator class.

Acceptance Criteria

  • Audit all validation files in data/src/validation/ to identify inconsistent statistical validation patterns
  • Refactor statistical range validations to consistently use Pandera's DataFrameSchema with appropriate statistical checks
  • Replace custom statistical validation logic with Pandera's built-in statistical validation methods where possible
  • Consolidate repetitive statistical validation code into the base validator class or reusable utility functions
  • Update validation schemas to use consistent patterns and naming conventions
  • Ensure validation performance is maintained or improved after refactoring

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions