Integrate ‘ingestion’ scripts from CKAN set-up contractors

As part of the contract to develop [transport-data/tdc-data-portal](https://github.com/transport-data/tdc-data-portal), the contractor wrote some “data ingestion scripts”. These are two directories in that repo:
- [src/ckanext-tdc/ckanext/tdc/data-ingestion](https://github.com/transport-data/tdc-data-portal/tree/main/src/ckanext-tdc/ckanext/tdc/data-ingestion).
- [src/ckanext-tdc/ckanext/tdc/data-integration](https://github.com/transport-data/tdc-data-portal/tree/main/src/ckanext-tdc/ckanext/tdc/data-integration).

Unfortunately:
- The scripts for the JRC IDEES source and Eurostat provider duplicate the contents of [`transport_data.jrc`](https://docs.transport-data.org/en/latest/jrc.html) and [`transport_data.estat`](https://docs.transport-data.org/en/latest/estat.html); the script for GFEI data seems to be a fixed mirror (i.e. not reusable) of the GFEI Zenodo record.
- The code seems extremely verbose (JRC file is 4400 lines without formatting; data-integration/process_tdc.py is 20000 lines), and involves a lot of duplication/copy-and-paste.
- SDMX metadata are not generated; metadata are fed directly into CKAN via API calls.

More info:
- The scripts *do* serve as a complete/working example of how to interact with CKAN through its APIs—though directly using `requests`, and not through a CKAN API client (#3).
- According to the contractor, the scripts either create records or skip those that exist; they do not update metadata on existing records if it has changed.

**To resolve**, likely in multiple issues/PRs:
- Integrate the functions of the scripts by into existing modules in the current package.
- Replace the workflow that calls the scripts with a workflow calling, e.g. `tdc jrc refresh-ckan`
- Add functionality to identify existing records and update as needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate ‘ingestion’ scripts from CKAN set-up contractors #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Integrate ‘ingestion’ scripts from CKAN set-up contractors #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions