Skip to content

[FEATURE] data collection and download feature #90

@ututono

Description

@ututono

Is your feature request related to a problem? Please describe

CARE already provides a one-click full data download, but the raw export requires significant manual post-processing before it can be used for research. Administrators must write and run custom Python scripts to apply consent filtering and anonymization, there is no built-in mechanism to enforce privacy consent or anonymize user identities before sharing data with researchers.

Describe the solution you'd like

A job-based data export pipeline in the CARE admin UI that allows administrators to configure and trigger async export jobs scoped by instance profile and time range. The output is a downloadable ZIP containing a JSONL file (one package per paper or student), optional PDF documents, and a metadata file, with consent filtering and anonymization applied under the hood automatically according to each instance's policy.

Describe alternatives you've considered

Using the existing one-click full data download combined with manually running Python scripts for anonymization and consent filtering was considered, but this approach requires per-instance scripting effort, is not accessible to non-technical administrators, and provides no audit log. Standardizing this into a built-in pipeline removes the dependency on individual scripts and makes the process consistent across all instances.

Additional context

Details are list in the comments below

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions