Skip to content

Bug Report: Tool.from_component() incorrectly converts OrganizationalEntity to string #917

@vpetersson

Description

@vpetersson

Summary

When converting a Component to a Tool using Tool.from_component(), the method incorrectly converts OrganizationalEntity objects (from component.manufacturer or component.supplier) to plain strings for the tool.vendor field. This causes TypeError during serialization when tools with different vendor types are sorted.

Environment

  • cyclonedx-python-lib version: 8.4.0
  • Python version: 3.13
  • Operating System: macOS

Bug Description

Issue 1: Inconsistent Vendor Type Handling

The library allows Tool objects to have either str or OrganizationalEntity vendors in memory, but:

  1. The legacy tool schema (CycloneDX 1.6 line 724) specifies vendor as type: "string"
  2. The library allows creating Tools with OrganizationalEntity vendors
  3. Tool.from_component() converts OrganizationalEntity manufacturer/supplier to string vendor (uses component.group field)

This creates inconsistent vendor types when mixing programmatically created tools with converted components:

# Tool created directly with OrganizationalEntity (allowed by library)
tool1 = Tool(vendor=OrganizationalEntity(name="sbomify"), ...)
print(type(tool1.vendor))  # <class 'cyclonedx.model.contact.OrganizationalEntity'>

# Tool converted from component (uses group field as string)
component.manufacturer = OrganizationalEntity(name="Aqua Security")
component.group = "aquasecurity"
tool2 = Tool.from_component(component)
print(type(tool2.vendor))  # <class 'str'> - uses group field!
print(tool2.vendor)  # "aquasecurity"

Issue 2: Serialization TypeError

When tools with mixed vendor types (string vs OrganizationalEntity) are serialized, sorting fails:

TypeError: '<' not supported between instances of 'str' and 'OrganizationalEntity'
    at cyclonedx/model/tool.py:158 in __lt__
    at cyclonedx/_internal/compare.py:45 in __lt__

Issue 3: Deserialization Hash Error

When serialized tools (legacy array format) are re-parsed, vendor dicts aren't deserialized to OrganizationalEntity objects, causing hash errors:

TypeError: unhashable type: 'dict'
    at cyclonedx/model/tool.py:166 in __hash__

Steps to Reproduce

Minimal Reproduction (Issue 1 & 2):

from cyclonedx.model.bom import Bom, OrganizationalEntity, Tool
from cyclonedx.model.component import Component, ComponentType
from cyclonedx.output.json import JsonV1Dot6
import json

# Simulate real Trivy SBOM structure (with manufacturer and group)
trivy_component_json = {
    "type": "application",
    "manufacturer": {
        "name": "Aqua Security Software Ltd."
    },
    "group": "aquasecurity",
    "name": "trivy",
    "version": "0.67.2"
}

# Parse it
component = Component.from_json(trivy_component_json)
print(f"Component manufacturer type: {type(component.manufacturer)}")
# Output: <class 'cyclonedx.model.contact.OrganizationalEntity'>

# Convert to Tool - THIS IS THE BUG
tool = Tool.from_component(component)
print(f"Tool vendor type: {type(tool.vendor)}")
print(f"Tool vendor value: {tool.vendor}")
# Output: <class 'str'>
# Output: aquasecurity
# ❌ BUG: Vendor was converted from OrganizationalEntity to string!

# Now create a BOM with mixed vendor types
bom = Bom()
bom.metadata.tools.components.add(component)

# Add another tool directly with OrganizationalEntity vendor
sbomify_tool = Tool(
    vendor=OrganizationalEntity(name="sbomify"),
    name="sbomify-action",
    version="1.0.0"
)
bom.metadata.tools.tools.add(sbomify_tool)

# Try to serialize - this will fail
try:
    outputter = JsonV1Dot6(bom)
    json_output = outputter.output_as_string()
    print("SUCCESS")
except TypeError as e:
    print(f"FAILED: {e}")
    # Output: TypeError: '<' not supported between instances of 'str' and 'OrganizationalEntity'

Reproduction Details:

What happens:

  1. Component has manufacturer as OrganizationalEntity
  2. During serialization, Tool.from_component() is called (in tool.py:271)
  3. The conversion creates a Tool with vendor as a string instead of OrganizationalEntity
  4. When sorting tools (in SortedSet), comparison fails because one tool has string vendor, another has OrganizationalEntity vendor

Minimal Reproduction (Issue 3):

from cyclonedx.model.bom import Bom, OrganizationalEntity, Tool
from cyclonedx.output.json import JsonV1Dot6
import json

# Create BOM with tool
bom = Bom()
tool = Tool(
    vendor=OrganizationalEntity(name="Test"),
    name="test-tool",
    version="1.0.0"
)
bom.metadata.tools.tools.add(tool)

# Serialize
outputter = JsonV1Dot6(bom)
json_str = outputter.output_as_string()

# Parse back
data = json.loads(json_str)
print(f"Serialized tools format: {type(data['metadata']['tools'])}")  # list (legacy format)
print(f"First tool vendor: {data['metadata']['tools'][0]['vendor']}")  # {'name': 'Test'} - a dict

# Try to deserialize - this will fail
try:
    bom2 = Bom.from_json(data)
    print("SUCCESS")
except TypeError as e:
    print(f"FAILED: {e}")
    # Output: TypeError: unhashable type: 'dict'

Expected Behavior

  1. Tool.from_component() should preserve the OrganizationalEntity type when converting manufacturer/supplier to vendor
  2. All tools should have consistent vendor types to enable safe comparison
  3. Serialized tools should be deserializable without type errors

Actual Behavior

  1. Tool.from_component() converts OrganizationalEntity to str for vendor
  2. Mixed vendor types cause TypeError during sorting
  3. Legacy format serialization produces vendor as dict, which can't be deserialized

Impact

  • High: Prevents augmentation of SBOMs from common generators (Trivy, Syft, etc.)
  • Common scenario: Adding tools to existing SBOMs with tool metadata
  • Breaking change: Cannot round-trip SBOMs through serialize/deserialize

Workaround

We've implemented a workaround in our codebase:

  1. Pre-convert components/services to tools before adding new tools
  2. Fix vendor types after conversion to ensure OrganizationalEntity
  3. Preprocess JSON before deserialization to convert legacy array format to components format

See our implementation: https://github.com/sbomify/github-action/blob/master/sbomify_action/augmentation.py#L171-L220

Root Cause Analysis

The issue stems from mixing two formats:

  1. Legacy tool format (deprecated): vendor as string (per schema line 724)
  2. Modern format (1.5+): Components use manufacturer/supplier as OrganizationalEntity

The library allows creating Tool objects with either string or OrganizationalEntity vendors, but this causes type comparison errors during sorting.

Suggested Fix

Option 1: Normalize to String (Backwards Compatible)

Enforce that Tool.vendor is always a string when using legacy serialization:

# In Tool class or serialization logic
def _normalize_vendor_for_legacy_format(self):
    """Normalize vendor to string for legacy tool format."""
    if isinstance(self.vendor, OrganizationalEntity):
        return self.vendor.name
    return self.vendor

# During serialization, normalize all vendors to strings before sorting

Option 2: Use Modern Format When OrganizationalEntity Present (Recommended)

When any tool has an OrganizationalEntity vendor, serialize using modern components format instead of legacy array:

# In serialization logic
def should_use_modern_tools_format(tools):
    """Check if any tool requires modern format."""
    return any(
        isinstance(tool.vendor, OrganizationalEntity)
        for tool in tools if tool.vendor
    )

Option 3: Fix Tool.from_component()

Make Tool.from_component() preserve OrganizationalEntity consistently, but ensure serialization normalizes:

@staticmethod
def from_component(component: 'Component') -> 'Tool':
    vendor = component.manufacturer or component.supplier
    # Don't convert to string - preserve type
    # But this requires serialization to handle both types
    return Tool(vendor=vendor, name=component.name, ...)

Related Issues

This may be related to deprecation of legacy tools array format in favor of components/services format in CycloneDX 1.5+.

Additional Context

The issue manifests in real-world scenarios where:

  • SBOMs generated by Trivy use manufacturer as OrganizationalEntity
  • Users augment these SBOMs by adding additional tools
  • The mixed vendor types cause serialization to fail during tool sorting
  • Re-parsing augmented SBOMs fails due to vendor deserialization issues

Test Case

A complete test case demonstrating the issue is available in our test suite:
https://github.com/sbomify/github-action/blob/master/tests/test_augmentation_module.py#L989-L1047


Thank you for maintaining this excellent library! We'd be happy to contribute a PR if you'd like assistance implementing the fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions