-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline schema design #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
afkanpour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afkanpour reviewed 11 of 11 files at r1, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @kohankhaki)
src/schemas/io_utils.py line 16 at r1 (raw file):
def save_experiment_output(
Why not naming these functions save_experiment(), save_domain(), load_experiment(), etc.?
Code quote:
save_experiment_outputsrc/schemas/area_schemas.py line 12 at r1 (raw file):
name: str area_id: str
Curious, what are attributes like area_id and domain_id used for?
Code quote:
area_idsrc/schemas/area_schemas.py line 14 at r1 (raw file):
area_id: str description: Optional[str] = None domain: str = ""
Use a domain object instead.
Code quote:
domainsrc/schemas/area_schemas.py line 15 at r1 (raw file):
description: Optional[str] = None domain: str = "" domain_id: str = ""
I guess this can be removed if we use a domain object.
Code quote:
domain_idsrc/schemas/metadata_schemas.py line 1 at r1 (raw file):
"""Metadata schemas for pipeline stages."""
Please expand the docstring explaining the intention for introducing metadata, how/where they are used.
In general, comments should be expressive so the reader has a good understanding of what the code is supposed to do before diving into the actual implementation.
src/schemas/capability_schemas.py line 14 at r1 (raw file):
capability_id: str description: Optional[str] = None area: str = ""
If we have objects for area, domain, etc., let's use them instead of strings. Please apply this change everywhere.
Code quote:
areasrc/schemas/experiment_schemas.py line 12 at r1 (raw file):
experiment_id: str domain: str
Why not pass a domain object? That way, we can remove domain_id here too.
Code quote:
domain… I/O functions, use dataclass objects for hierarchical relationships, and improve documentation
kohankhaki
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @afkanpour)
src/schemas/area_schemas.py line 12 at r1 (raw file):
Previously, afkanpour (Arash) wrote…
Curious, what are attributes like area_id and domain_id used for?
Done.
src/schemas/area_schemas.py line 14 at r1 (raw file):
Previously, afkanpour (Arash) wrote…
Use a domain object instead.
Done.
src/schemas/area_schemas.py line 15 at r1 (raw file):
Previously, afkanpour (Arash) wrote…
I guess this can be removed if we use a domain object.
Done.
src/schemas/capability_schemas.py line 14 at r1 (raw file):
Previously, afkanpour (Arash) wrote…
If we have objects for area, domain, etc., let's use them instead of strings. Please apply this change everywhere.
Done.
src/schemas/experiment_schemas.py line 12 at r1 (raw file):
Previously, afkanpour (Arash) wrote…
Why not pass a domain object? That way, we can remove domain_id here too.
Done.
src/schemas/io_utils.py line 16 at r1 (raw file):
Previously, afkanpour (Arash) wrote…
Why not naming these functions save_experiment(), save_domain(), load_experiment(), etc.?
Done.
src/schemas/metadata_schemas.py line 1 at r1 (raw file):
Previously, afkanpour (Arash) wrote…
Please expand the docstring explaining the intention for introducing metadata, how/where they are used.
In general, comments should be expressive so the reader has a good understanding of what the code is supposed to do before diving into the actual implementation.
Done.
afkanpour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afkanpour reviewed 12 of 12 files at r2.
Reviewable status: 4 of 12 files reviewed, 8 unresolved discussions (waiting on @kohankhaki)
src/schemas/area_schemas.py line 20 at r2 (raw file):
area_id: str description: Optional[str] = None domain: Optional[Domain] = None
Is there a scenario where an area instance should be created without a domain? If not, we shouldn't make domain optional.
Same applies to other classes/stages.
Code quote:
Optional[Domain]…nal. Add file references to all dataclass sections in PIPELINE_SCHEMAS.md documentation.
kohankhaki
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 12 files reviewed, 8 unresolved discussions (waiting on @afkanpour)
src/schemas/area_schemas.py line 20 at r2 (raw file):
Previously, afkanpour (Arash) wrote…
Is there a scenario where an area instance should be created without a domain? If not, we shouldn't make domain optional.
Same applies to other classes/stages.
Makes sense, this make input and output more standard.
afkanpour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afkanpour reviewed 2 of 8 files at r3, 6 of 6 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @kohankhaki)
PR Type
Feature
Short Description
This pull request introduces a comprehensive, standardized schema system for all stages of the ACE pipeline. It adds Python dataclasses for each pipeline stage, implements robust save/load I/O utilities, and centralizes all schema imports and exports for easy use. The changes ensure consistent data formats across the pipeline, making it easier to maintain, extend, and integrate different components.
Current pipeline does not use this exact schema. The goal is to standardize future implementations.
src/schemas/README.mdandsrc/schemas/PIPELINE_SCHEMAS.mdexplain details of the pipeline schema.Tests Added
None
This change is