-
Notifications
You must be signed in to change notification settings - Fork 4
Description
The BacDive JSON dump is a list of multiple JSON objects, each of which has a set of keys, useful ones from which are pulling information to make edges and nodes in KGX files. An example of a key name from these JSON objects is – Morphology which has another key called cell morphology nested within it. It's possible that the values at these key names might either be a dict or a list.
For example, when we are trying to pull out information from the Morphology.cell morphology key, it can look like either of the two cases below:
Case 1 (list):
"Morphology": {
"cell morphology": [
{
"@ref": 68367,
"cell shape": "rod-shaped"
},
{
"@ref": 68367,
"gram stain": "positive"
}
],
"colony morphology": {
"@ref": 18803,
"type of hemolysis": "gamma",
"incubation period": "1-2 days"
}
}Case 2 (dict):
"Morphology": {
"cell morphology": {
"@ref": 30759,
"gram stain": "positive",
"cell length": "0.55 \u00b5m",
"cell width": "0.55 \u00b5m",
"cell shape": "coccus-shaped",
"motility": "no"
}Currently, all the structural assumptions for the keys are in the BacDive transformation pipeline (Python code) here: https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/master/kg_microbe/transform_utils/bacdive/bacdive.py, but there are assumptions based on JSON path specification in the 'bacdive keyword synonym' column in the 'relabeled classes' tab in the METPO sheet.
We need to make sure that for all the keys in the BacDive data source, we are accounting for the structural possibilities of all keys.