You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of the contract to develop transport-data/tdc-data-portal, the contractor wrote some “data ingestion scripts”. These are two directories in that repo:
The scripts for the JRC IDEES source and Eurostat provider duplicate the contents of transport_data.jrc and transport_data.estat; the script for GFEI data seems to be a fixed mirror (i.e. not reusable) of the GFEI Zenodo record.
The code seems extremely verbose (JRC file is 4400 lines without formatting; data-integration/process_tdc.py is 20000 lines), and involves a lot of duplication/copy-and-paste.
SDMX metadata are not generated; metadata are fed directly into CKAN via API calls.
More info:
The scripts do serve as a complete/working example of how to interact with CKAN through its APIs—though directly using requests, and not through a CKAN API client (Add a CKAN client #3).
According to the contractor, the scripts either create records or skip those that exist; they do not update metadata on existing records if it has changed.
To resolve, likely in multiple issues/PRs:
Integrate the functions of the scripts by into existing modules in the current package.
Replace the workflow that calls the scripts with a workflow calling, e.g. tdc jrc refresh-ckan
Add functionality to identify existing records and update as needed.
As part of the contract to develop transport-data/tdc-data-portal, the contractor wrote some “data ingestion scripts”. These are two directories in that repo:
Unfortunately:
transport_data.jrcandtransport_data.estat; the script for GFEI data seems to be a fixed mirror (i.e. not reusable) of the GFEI Zenodo record.More info:
requests, and not through a CKAN API client (Add a CKAN client #3).To resolve, likely in multiple issues/PRs:
tdc jrc refresh-ckan