Add MileBench validation for VLMs #2358

l-bat · 2025-06-18T08:47:08Z

Ticket: 161649

tests/python_tests/test_kv_cache_eviction.py

Copilot

Pull Request Overview

This PR adds MileBench dataset support and validation for visual-language models (VLMs), integrates new MileBench benchmarks into existing cache eviction tests, and updates fixtures/workflows to handle MileBench data.

Introduce utils/milebench.py with MileBenchDataset and Eval for data loading and evaluation.
Extend test_kv_cache_eviction.py to run MileBench benchmarks via parametrized tests.
Update fixtures (conftest.py) and CI workflows to download/extract MileBench parts and adjust timeouts.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/python_tests/utils/milebench.py	Added MileBenchDataset and Eval classes for dataset handling and scoring.
tests/python_tests/test_kv_cache_eviction.py	Integrated new MileBench benchmarks into cache eviction tests.
tests/python_tests/conftest.py	Extended `TEST_FILES` and `download_test_content` fixture for MileBench archives.
tests/python_tests/samples/conftest.py	Removed obsolete `download_test_content` imports in sample tests.
.github/workflows/*.{yml}	Increased timeout for Cacheopt E2E CI jobs to accommodate longer tests.

Comments suppressed due to low confidence (2)

tests/python_tests/utils/milebench.py:103

[nitpick] The class name Eval is very generic and could collide with other utilities; consider renaming it to something more descriptive like MileBenchEvaluator.

class Eval:

tests/python_tests/utils/milebench.py:19

[nitpick] New MileBenchDataset and Eval utilities lack dedicated unit tests; consider adding tests that verify string transformation, image preprocessing, and choice matching logic.

class MileBenchDataset:

samples/python/visual_language_chat/milebench_eval_vlm.py

tests/python_tests/test_kv_cache_eviction.py

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

samples/python/visual_language_chat/milebench_eval_vlm.py

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

samples/python/visual_language_chat/milebench_eval_vlm.py

samples/python/visual_language_chat/README.md

…genai into lt/eval_milebench

l-bat marked this pull request as draft June 18, 2025 08:47

github-actions bot added category: continuous batching Continuous batching no-match-files labels Jun 18, 2025

l-bat force-pushed the lt/eval_milebench branch from db5b244 to 9ac0082 Compare June 24, 2025 08:06

github-actions bot added category: llm_bench Label for tool/llm_bench folder category: Image generation samples GenAI Image generation samples labels Jun 24, 2025

l-bat marked this pull request as ready for review June 24, 2025 14:35

l-bat requested a review from vshampor June 24, 2025 14:50

l-bat force-pushed the lt/eval_milebench branch 4 times, most recently from 65df084 to a740b42 Compare June 27, 2025 07:56

vshampor approved these changes Jun 27, 2025

View reviewed changes

Wovchena requested a review from Copilot July 3, 2025 12:38

This comment was marked as outdated.

Sign in to view

Wovchena reviewed Jul 3, 2025

View reviewed changes

tests/python_tests/test_kv_cache_eviction.py Outdated Show resolved Hide resolved

l-bat force-pushed the lt/eval_milebench branch 2 times, most recently from 0cad800 to 2cb9233 Compare July 7, 2025 07:10

alexsu52 requested a review from Wovchena July 7, 2025 12:01

alexsu52 approved these changes Jul 7, 2025

View reviewed changes

l-bat force-pushed the lt/eval_milebench branch 3 times, most recently from 92f191d to eac1c80 Compare July 15, 2025 14:02

github-actions bot added the category: GHA CI based on Github actions label Jul 15, 2025

Wovchena approved these changes Jul 16, 2025

View reviewed changes

Wovchena requested a review from Copilot July 17, 2025 06:25

Copilot AI reviewed Jul 17, 2025

View reviewed changes

samples/python/visual_language_chat/milebench_eval_vlm.py Show resolved Hide resolved

tests/python_tests/test_kv_cache_eviction.py Outdated Show resolved Hide resolved

l-bat force-pushed the lt/eval_milebench branch 3 times, most recently from f838dc3 to 0a5a268 Compare July 22, 2025 10:05

l-bat added 3 commits September 23, 2025 14:04

change data dir

9293e52

fix setup_and_teardown

9b30b6a

fix request problem

75d4633

l-bat force-pushed the lt/eval_milebench branch from a840a62 to 75d4633 Compare September 23, 2025 13:08

Copilot AI review requested due to automatic review settings December 3, 2025 17:33

github-actions bot added category: VLM samples GenAI VLM samples and removed category: continuous batching Continuous batching labels Dec 3, 2025

Copilot AI reviewed Dec 3, 2025

View reviewed changes

samples/python/visual_language_chat/milebench_eval_vlm.py Outdated Show resolved Hide resolved

l-bat force-pushed the lt/eval_milebench branch from 631718c to ce1be2f Compare December 3, 2025 17:37

Copilot AI review requested due to automatic review settings December 3, 2025 17:44

l-bat force-pushed the lt/eval_milebench branch from ce1be2f to 037dd74 Compare December 3, 2025 17:44

github-actions bot removed the category: Image generation samples GenAI Image generation samples label Dec 3, 2025

This comment was marked as duplicate.

Sign in to view

l-bat force-pushed the lt/eval_milebench branch from 037dd74 to cf2a650 Compare December 3, 2025 17:52

openvinotoolkit deleted a comment from Copilot AI Dec 3, 2025

Move MileBench eval to samples

7283a08

Copilot AI review requested due to automatic review settings December 3, 2025 17:57

l-bat force-pushed the lt/eval_milebench branch from cf2a650 to 7283a08 Compare December 3, 2025 17:57

Copilot AI reviewed Dec 3, 2025

View reviewed changes

github-actions bot removed category: llm_bench Label for tool/llm_bench folder category: GGUF GGUF file reader labels Dec 3, 2025

openvinotoolkit deleted a comment from Copilot AI Dec 3, 2025

l-bat requested a review from Copilot December 3, 2025 18:18

Copilot AI reviewed Dec 3, 2025

View reviewed changes

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

597bfe2

…genai into lt/eval_milebench

l-bat force-pushed the lt/eval_milebench branch from 5ed0a30 to 597bfe2 Compare December 3, 2025 20:41

l-bat added this pull request to the merge queue Dec 4, 2025

Merged via the queue into openvinotoolkit:master with commit b160eee Dec 4, 2025
96 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MileBench validation for VLMs #2358

Add MileBench validation for VLMs #2358

l-bat commented Jun 18, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

This comment was marked as duplicate.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add MileBench validation for VLMs #2358

Add MileBench validation for VLMs #2358

Conversation

l-bat commented Jun 18, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

This comment was marked as duplicate.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants