Skip to content

Task: Fix Spatial Join Deduplication Issue and Refactor Pipeline Services #1261

@nlebovits

Description

@nlebovits

Describe the task

Investigate and fix the deduplication functionality in the spatial_join utility that is not properly removing duplicate opa_id records after spatial joins. During pipeline validation development, it was discovered that the expected deduplication by opa_id in the spatial join utility is failing, leading to duplicate records in the output. As a temporary workaround, individual deduplication logic was added to several pipeline services, but this addresses the symptom rather than the root cause. This ticket involves identifying why the spatial join deduplication is not working as designed, fixing the underlying issue, and then refactoring the affected services to remove the temporary workaround code.

Acceptance Criteria

  • Investigate the spatial_join utility to identify why opa_id deduplication is not functioning correctly
  • Debug and analyze the spatial join process to understand where duplicate records are being introduced or preserved
  • Fix the root cause of the deduplication failure in the spatial join utility
  • Test the fixed spatial join utility with sample data to verify proper opa_id deduplication
  • Identify all pipeline services that have temporary individual deduplication workarounds
  • Refactor identified services to remove temporary deduplication code once spatial join is fixed
  • Ensure all unit tests continue to pass after fixing the spatial join utility
  • Verify that all validation checks pass with the refactored services
  • Run the complete pipeline to ensure it executes successfully from start to finish
  • Add or enhance tests for the spatial join utility to prevent regression of this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions