-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Open
Copy link
Labels
enhancementNew feature or requestNew feature or requestpriority: lowIssue with low priorityIssue with low priority
Description
Description:
Currently, the code related to dynamic splitting in chebi.py
and the proteins repo’s data class is duplicated. Both implementations are effectively the same, which leads to unnecessary code redundancy.
Proposed changes:
-
Move common code to base class — e.g.,
DynamicDataset
— to encapsulate shared dynamic splitting logic.- Both ChEBI and protein dataset classes should inherit from this base class.
- This will centralize changes and make maintenance easier.
-
Refactor dataset hierarchy to be more generic:
-
Certain hyperparameters that are specific to ChEBI, such as
chebi_version: int = 200
in
XYBaseDataModule
, should be pushed down into a ChEBI-specific base class rather than existing in a generic base.
-
-
Outcome:
- Eliminate duplicate code between
chebi.py
and the proteins repo. - Improve maintainability by isolating dataset-specific configurations.
- Make it easier to introduce new datasets without rewriting the splitting logic.
- Eliminate duplicate code between
sfluegel05
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestpriority: lowIssue with low priorityIssue with low priority