Skip to content

Conversation

@kohankhaki
Copy link
Collaborator

@kohankhaki kohankhaki commented Nov 25, 2025

PR Type

Feature

Short Description

This pull request introduces a comprehensive, standardized schema system for all stages of the ACE pipeline. It adds Python dataclasses for each pipeline stage, implements robust save/load I/O utilities, and centralizes all schema imports and exports for easy use. The changes ensure consistent data formats across the pipeline, making it easier to maintain, extend, and integrate different components.
Current pipeline does not use this exact schema. The goal is to standardize future implementations.
src/schemas/README.md and src/schemas/PIPELINE_SCHEMAS.md explain details of the pipeline schema.

Tests Added

None


This change is Reviewable

@kohankhaki kohankhaki requested a review from afkanpour November 25, 2025 19:10
Copy link
Collaborator

@afkanpour afkanpour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afkanpour reviewed 11 of 11 files at r1, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @kohankhaki)


src/schemas/io_utils.py line 16 at r1 (raw file):

def save_experiment_output(

Why not naming these functions save_experiment(), save_domain(), load_experiment(), etc.?

Code quote:

save_experiment_output

src/schemas/area_schemas.py line 12 at r1 (raw file):

    name: str
    area_id: str

Curious, what are attributes like area_id and domain_id used for?

Code quote:

area_id

src/schemas/area_schemas.py line 14 at r1 (raw file):

    area_id: str
    description: Optional[str] = None
    domain: str = ""

Use a domain object instead.

Code quote:

domain

src/schemas/area_schemas.py line 15 at r1 (raw file):

    description: Optional[str] = None
    domain: str = ""
    domain_id: str = ""

I guess this can be removed if we use a domain object.

Code quote:

domain_id

src/schemas/metadata_schemas.py line 1 at r1 (raw file):

"""Metadata schemas for pipeline stages."""

Please expand the docstring explaining the intention for introducing metadata, how/where they are used.

In general, comments should be expressive so the reader has a good understanding of what the code is supposed to do before diving into the actual implementation.


src/schemas/capability_schemas.py line 14 at r1 (raw file):

    capability_id: str
    description: Optional[str] = None
    area: str = ""

If we have objects for area, domain, etc., let's use them instead of strings. Please apply this change everywhere.

Code quote:

area

src/schemas/experiment_schemas.py line 12 at r1 (raw file):

    experiment_id: str
    domain: str

Why not pass a domain object? That way, we can remove domain_id here too.

Code quote:

domain

… I/O functions, use dataclass objects for hierarchical relationships, and improve documentation
Copy link
Collaborator Author

@kohankhaki kohankhaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @afkanpour)


src/schemas/area_schemas.py line 12 at r1 (raw file):

Previously, afkanpour (Arash) wrote…

Curious, what are attributes like area_id and domain_id used for?

Done.


src/schemas/area_schemas.py line 14 at r1 (raw file):

Previously, afkanpour (Arash) wrote…

Use a domain object instead.

Done.


src/schemas/area_schemas.py line 15 at r1 (raw file):

Previously, afkanpour (Arash) wrote…

I guess this can be removed if we use a domain object.

Done.


src/schemas/capability_schemas.py line 14 at r1 (raw file):

Previously, afkanpour (Arash) wrote…

If we have objects for area, domain, etc., let's use them instead of strings. Please apply this change everywhere.

Done.


src/schemas/experiment_schemas.py line 12 at r1 (raw file):

Previously, afkanpour (Arash) wrote…

Why not pass a domain object? That way, we can remove domain_id here too.

Done.


src/schemas/io_utils.py line 16 at r1 (raw file):

Previously, afkanpour (Arash) wrote…

Why not naming these functions save_experiment(), save_domain(), load_experiment(), etc.?

Done.


src/schemas/metadata_schemas.py line 1 at r1 (raw file):

Previously, afkanpour (Arash) wrote…

Please expand the docstring explaining the intention for introducing metadata, how/where they are used.

In general, comments should be expressive so the reader has a good understanding of what the code is supposed to do before diving into the actual implementation.

Done.

Copy link
Collaborator

@afkanpour afkanpour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afkanpour reviewed 12 of 12 files at r2.
Reviewable status: 4 of 12 files reviewed, 8 unresolved discussions (waiting on @kohankhaki)


src/schemas/area_schemas.py line 20 at r2 (raw file):

    area_id: str
    description: Optional[str] = None
    domain: Optional[Domain] = None

Is there a scenario where an area instance should be created without a domain? If not, we shouldn't make domain optional.

Same applies to other classes/stages.

Code quote:

Optional[Domain]

…nal.

Add file references to all dataclass sections in PIPELINE_SCHEMAS.md documentation.
Copy link
Collaborator Author

@kohankhaki kohankhaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 12 files reviewed, 8 unresolved discussions (waiting on @afkanpour)


src/schemas/area_schemas.py line 20 at r2 (raw file):

Previously, afkanpour (Arash) wrote…

Is there a scenario where an area instance should be created without a domain? If not, we shouldn't make domain optional.

Same applies to other classes/stages.

Makes sense, this make input and output more standard.

Copy link
Collaborator

@afkanpour afkanpour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afkanpour reviewed 2 of 8 files at r3, 6 of 6 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @kohankhaki)

@kohankhaki kohankhaki merged commit 029217e into main Dec 5, 2025
1 of 2 checks passed
@kohankhaki kohankhaki deleted the pipeline_schema branch December 5, 2025 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants