Is your feature request related to a problem? Please describe
CARE already provides a one-click full data download, but the raw export requires significant manual post-processing before it can be used for research. Administrators must write and run custom Python scripts to apply consent filtering and anonymization, there is no built-in mechanism to enforce privacy consent or anonymize user identities before sharing data with researchers.
Describe the solution you'd like
A job-based data export pipeline in the CARE admin UI that allows administrators to configure and trigger async export jobs scoped by instance profile and time range. The output is a downloadable ZIP containing a JSONL file (one package per paper or student), optional PDF documents, and a metadata file, with consent filtering and anonymization applied under the hood automatically according to each instance's policy.
Describe alternatives you've considered
Using the existing one-click full data download combined with manually running Python scripts for anonymization and consent filtering was considered, but this approach requires per-instance scripting effort, is not accessible to non-technical administrators, and provides no audit log. Standardizing this into a built-in pipeline removes the dependency on individual scripts and makes the process consistent across all instances.
Additional context
Details are list in the comments below
Is your feature request related to a problem? Please describe
CARE already provides a one-click full data download, but the raw export requires significant manual post-processing before it can be used for research. Administrators must write and run custom Python scripts to apply consent filtering and anonymization, there is no built-in mechanism to enforce privacy consent or anonymize user identities before sharing data with researchers.
Describe the solution you'd like
A job-based data export pipeline in the CARE admin UI that allows administrators to configure and trigger async export jobs scoped by instance profile and time range. The output is a downloadable ZIP containing a JSONL file (one package per paper or student), optional PDF documents, and a metadata file, with consent filtering and anonymization applied under the hood automatically according to each instance's policy.
Describe alternatives you've considered
Using the existing one-click full data download combined with manually running Python scripts for anonymization and consent filtering was considered, but this approach requires per-instance scripting effort, is not accessible to non-technical administrators, and provides no audit log. Standardizing this into a built-in pipeline removes the dependency on individual scripts and makes the process consistent across all instances.
Additional context
Details are list in the comments below