-
Notifications
You must be signed in to change notification settings - Fork 304
Add MileBench validation for VLMs #2358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
65df084 to
a740b42
Compare
0cad800 to
2cb9233
Compare
92f191d to
eac1c80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds MileBench dataset support and validation for visual-language models (VLMs), integrates new MileBench benchmarks into existing cache eviction tests, and updates fixtures/workflows to handle MileBench data.
- Introduce
utils/milebench.pywithMileBenchDatasetandEvalfor data loading and evaluation. - Extend
test_kv_cache_eviction.pyto run MileBench benchmarks via parametrized tests. - Update fixtures (
conftest.py) and CI workflows to download/extract MileBench parts and adjust timeouts.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/python_tests/utils/milebench.py | Added MileBenchDataset and Eval classes for dataset handling and scoring. |
| tests/python_tests/test_kv_cache_eviction.py | Integrated new MileBench benchmarks into cache eviction tests. |
| tests/python_tests/conftest.py | Extended TEST_FILES and download_test_content fixture for MileBench archives. |
| tests/python_tests/samples/conftest.py | Removed obsolete download_test_content imports in sample tests. |
| .github/workflows/*.{yml} | Increased timeout for Cacheopt E2E CI jobs to accommodate longer tests. |
Comments suppressed due to low confidence (2)
tests/python_tests/utils/milebench.py:103
- [nitpick] The class name
Evalis very generic and could collide with other utilities; consider renaming it to something more descriptive likeMileBenchEvaluator.
class Eval:
tests/python_tests/utils/milebench.py:19
- [nitpick] New
MileBenchDatasetandEvalutilities lack dedicated unit tests; consider adding tests that verify string transformation, image preprocessing, and choice matching logic.
class MileBenchDataset:
f838dc3 to
0a5a268
Compare
a840a62 to
75d4633
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
631718c to
ce1be2f
Compare
ce1be2f to
037dd74
Compare
037dd74 to
cf2a650
Compare
cf2a650 to
7283a08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…genai into lt/eval_milebench
5ed0a30 to
597bfe2
Compare
Ticket: 161649