Account for possible structural differences of keys in BacDive data source transformation pipeline

The BacDive JSON dump is a list of multiple JSON objects, each of which has a set of keys, useful ones from which are pulling information to make edges and nodes in KGX files. An example of a key name from these JSON objects is – `Morphology` which has another key called `cell morphology` nested within it. It's possible that the values at these key names might either be a dict or a list.

For example, when we are trying to pull out information from the `Morphology.cell morphology` key, it can look like either of the two cases below:

Case 1 (list):

```json
    "Morphology": {
      "cell morphology": [
        {
          "@ref": 68367,
          "cell shape": "rod-shaped"
        },
        {
          "@ref": 68367,
          "gram stain": "positive"
        }
      ],
      "colony morphology": {
        "@ref": 18803,
        "type of hemolysis": "gamma",
        "incubation period": "1-2 days"
      }
    }
```

Case 2 (dict):

```json
    "Morphology": {
      "cell morphology": {
        "@ref": 30759,
        "gram stain": "positive",
        "cell length": "0.55 \u00b5m",
        "cell width": "0.55 \u00b5m",
        "cell shape": "coccus-shaped",
        "motility": "no"
      }
```

Currently, all the structural assumptions for the keys are in the BacDive transformation pipeline (Python code) here: https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/master/kg_microbe/transform_utils/bacdive/bacdive.py, but there are assumptions based on JSON path specification in the 'bacdive keyword synonym' column in the 'relabeled classes' tab in the METPO sheet.

We need to make sure that for all the keys in the BacDive data source, we are accounting for the structural possibilities of all keys.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account for possible structural differences of keys in BacDive data source transformation pipeline #481

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Account for possible structural differences of keys in BacDive data source transformation pipeline #481

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions