First principles datasets#181
Merged
trangdata merged 12 commits intoEpistasisLab:masterfrom Feb 25, 2025
Merged
Conversation
Data comes from two symbolic regression repos: - Miles Cranmer's PySR: https://github.com/MilesCranmer/PySR - Etienne Russeil et al.'s MvSR: https://github.com/erusseil/MvSR-analysis They are all datasets that have a first-principle equation derived from data and used in their respective papers to show how symbolic regression has the potential of retrieving the original equation when only observational data is available. While some of them have just a few samples and others are synthetically generated, they are challenging for symbolic regression methods and can be used to evaluate these algorithms. The idea of pushing them into PMLB is to help other users to quickly set up experiments with the data. I still need to write proper metadata for them.
CI was failing to parse the contents of these specific ones.
Created by https://github.com/gAldeia/pmlb/actions/runs/11616806556\nfrom f23672c on 2024-10-31
Collaborator
|
Thank you for this PR, @gAldeia! 💯 🌈 If you could update the metadata.yaml files (e.g. datasets/first_principles_rydberg/metadata.yaml), I will review! |
…pmlb into symreg_first_principles
Contributor
Author
|
Done! Let me know if everything is ok or if it needs a major review. |
gAldeia
commented
Feb 20, 2025
Contributor
Author
gAldeia
left a comment
There was a problem hiding this comment.
This is correct, I accidentally pushed changes from another branch into this one
Collaborator
|
ah I think one of the datasets' names are not matching (directory name vs. metadata dataset field):
|
…pmlb into symreg_first_principles
Created by https://github.com/gAldeia/pmlb/actions/runs/13465733857\nfrom a226e6b on 2025-02-21
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Data comes from two symbolic regression repos:
They are all datasets that have a first-principle equation derived from data and used in their respective papers to show how symbolic regression has the potential of retrieving the original equation when only observational data is available.
While some of them have just a few samples and others are synthetically generated, they are challenging for symbolic regression methods and can be used to evaluate these algorithms.
The idea of pushing them into PMLB is to help other users to quickly set up experiments with the data.
I still need to write proper metadata for them. My understanding is that opening a PR will trigger a GA that will push some new files to my fork, which I should complete before the new datasets go to revision. Please let me know if there is there anything I got wrong and need to update!