-
Couldn't load subscription status.
- Fork 185
[WIP] Adds initial benchmarking framework #1197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
…ons, minor cleanup of results dict. Signed-off-by: rlratzel <[email protected]>
…script. Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
…mage name. Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
6c3a342 to
aa9b2f6
Compare
| results_dir: "./results" | ||
| entries: | ||
| - name: quick_test | ||
| script: test_benchmark.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a mismatch between the between the argument passed to test_benchmark.py in this README.md and the arguments required in the script.
parser.add_argument("--input-path", required=True, help="Path to input data")
parser.add_argument("--output-path", required=True, help="Path to output data")
In fact, test_benchmark.py requires the above arguments. --benchmark-results-path is required in the script however, the framework automatically appends it in matrix.py (L44:L47) and this is why you have a FIXME there I believe.
|
|
||
| entries: | ||
| - name: benchmark_v1 | ||
| script: my_benchmark.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my_benchmark.py is not included in this repo so it might be odd to have test_bechmark.py in Example 1 and a different benchmark script which doesn't exist in Example 2. Maybe you meant test_benchmark.py ?
| - name: benchmark_v2 | ||
| script: my_benchmark.py | ||
| args: --input {dataset:sample_data,parquet} --algorithm v2 | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python benchmarking/run.py --config config.yamlYou forgot the above to be consistent with Example 1
| - name: cc_large | ||
| formats: | ||
| - type: parquet | ||
| path: /data/common_crawl_large.parquet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the user is expected to provide a path to a .parquet file ?
|
|
||
| entries: | ||
| - name: cc_extraction | ||
| script: common_crawl_benchmark.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the user expected to create its own common_crawl_benchmark.py or should it be test_benchmark.py ?
…rs present, adds convenience options to run.sh for debugging. Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
… deps, updates config.yaml to use the same env vars as used in tools/run.sh, updates tools/run.sh to expose env vars used for volume mounts so they can be used in config.YAML Signed-off-by: rlratzel <[email protected]>
… no args, makes default config specified in run.py instead of run.sh Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
…n for using local Curator for benchmark/debug without rebuilding image, allows for env to override each of the LOCAL_ paths. Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
…nal sinks may be explained differently. Signed-off-by: rlratzel <[email protected]>
Signed-off-by: rlratzel <[email protected]>
…bility. Signed-off-by: rlratzel <[email protected]>
…rs args for sinks and MatrixConfig for clarity. Signed-off-by: rlratzel <[email protected]>
…nment. Signed-off-by: rlratzel <[email protected]>
…, uses sys.executable for Python exe, searches parent dirs when looking for repo commit hash. Signed-off-by: rlratzel <[email protected]>
Initial version of Curator benchmarking framework, based off of PR1011.