Skip to content

Account for possible structural differences of keys in BacDive data source transformation pipeline #481

@sujaypatil96

Description

@sujaypatil96

The BacDive JSON dump is a list of multiple JSON objects, each of which has a set of keys, useful ones from which are pulling information to make edges and nodes in KGX files. An example of a key name from these JSON objects is – Morphology which has another key called cell morphology nested within it. It's possible that the values at these key names might either be a dict or a list.

For example, when we are trying to pull out information from the Morphology.cell morphology key, it can look like either of the two cases below:

Case 1 (list):

    "Morphology": {
      "cell morphology": [
        {
          "@ref": 68367,
          "cell shape": "rod-shaped"
        },
        {
          "@ref": 68367,
          "gram stain": "positive"
        }
      ],
      "colony morphology": {
        "@ref": 18803,
        "type of hemolysis": "gamma",
        "incubation period": "1-2 days"
      }
    }

Case 2 (dict):

    "Morphology": {
      "cell morphology": {
        "@ref": 30759,
        "gram stain": "positive",
        "cell length": "0.55 \u00b5m",
        "cell width": "0.55 \u00b5m",
        "cell shape": "coccus-shaped",
        "motility": "no"
      }

Currently, all the structural assumptions for the keys are in the BacDive transformation pipeline (Python code) here: https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/master/kg_microbe/transform_utils/bacdive/bacdive.py, but there are assumptions based on JSON path specification in the 'bacdive keyword synonym' column in the 'relabeled classes' tab in the METPO sheet.

We need to make sure that for all the keys in the BacDive data source, we are accounting for the structural possibilities of all keys.

Metadata

Metadata

Assignees

No one assigned

    Labels

    pipeline upgradesUpgrades to data source transformation pipelines

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions