Skip to content

Task: Create Golden Dataset for Comprehensive Unit Testing #1262

@nlebovits

Description

@nlebovits

Describe the task

Replace the current simulated test dataset in conftest.py with a "golden dataset" derived from actual pipeline output to improve test reliability and coverage. The existing test data is both artificially small and only simulates real data distributions, which limits the effectiveness of unit tests that require realistic data patterns and sufficient sample sizes. This task involves generating a representative ~1,000 row dataset from full pipeline output, implementing it as the new testing standard, updating all existing tests to use this golden dataset, and documenting the process for future maintenance and updates.

Acceptance Criteria

  • Run the complete pipeline to generate full output dataset for golden dataset creation
  • Extract a representative sample of approximately 1,000 rows that maintains realistic data distributions
  • Ensure the golden dataset covers edge cases and various data patterns found in production data
  • Create a systematic process for golden dataset generation that can be repeated and documented
  • Replace the current simulated dataset in conftest.py with the new golden dataset
  • Update all existing unit tests to work with the new golden dataset structure and size
  • Refactor test fixtures and helper functions to accommodate the larger, more realistic dataset
  • Ensure all tests continue to pass with the new golden dataset
  • Verify that tests now provide better coverage of realistic data scenarios
  • Document the golden dataset creation process, including data selection criteria and update procedures
  • Create guidelines for when and how to regenerate the golden dataset as the pipeline evolves
  • Add data quality checks to ensure the golden dataset remains representative over time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions