[CI] add API evaluation test #3987

littlegy · 2025-09-18T09:07:16Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

lvhan028 · 2025-09-19T03:40:44Z

autotest/utils/config_utils.py

+    if len(model_list) > 0:
+
+        if tp_num > 1:
+            communicators = ['native', 'nccl']


Please change native to cuda-ipc, since we are going to deprecate native, which is alias to cuda-ipc

lvhan028 · 2025-09-19T03:51:50Z

autotest/evaluate/test_api_evaluate_turbomind.py

@@ -0,0 +1,90 @@
+import pytest


This file is a copy of test_api_evaluate_pytorch.
Could you provide a unified test_api_evaluation.py?

autotest/config.yaml

lvhan028 · 2025-09-19T03:59:57Z

.github/workflows/api_eva.yml

@@ -0,0 +1,137 @@
+name: api_eva


may use api_eva and rename the filename api_eval.yml

autotest/evaluate/eval_config_chat.py

lvhan028 · 2025-09-19T04:17:53Z

autotest/utils/evaluate_utils.py

+    if work_dir and os.path.exists(work_dir):
+        try:
+            summary_dirs = glob.glob(os.path.join(work_dir, '*', 'summary'))
+            if summary_dirs:
+                summary_dir = summary_dirs[0]
+                csv_files = glob.glob(os.path.join(summary_dir, 'summary_*.csv'))
+                if csv_files:
+                    csv_file = sorted(csv_files)[-1]
+                    if os.path.exists(csv_file):
+                        with open(csv_file, 'r') as f:
+                            reader = csv.reader(f)
+                            next(reader)
+                            for row in reader:
+                                if len(row) >= 5 and row[4]:
+                                    dataset = row[0]
+                                    metric_value = row[4]
+                                    try:
+                                        metrics[dataset] = f'{float(metric_value):.2f}'
+                                    except ValueError:
+                                        metrics[dataset] = metric_value
+        except Exception as e:
+            print(f'Error reading metrics: {str(e)}')


Can we simplify this code snippet? The indent is too deep

lvhan028 · 2025-09-19T04:21:09Z

autotest/utils/evaluate_utils.py

+        write_header = False
+        if not os.path.exists(summary_file) or os.path.getsize(summary_file) == 0:
+            write_header = True
+        else:
+            with open(summary_file, 'r') as f:
+                first_lines = f.read(200)
+                if '| Model | Backend | TP | Status | mmlu | gsm8k |' not in first_lines:
+                    write_header = True


zhulinJulia24

LTGM

littlegy added 4 commits September 18, 2025 16:15

TEST: add api evaluate

759dca3

TEST: rm qwen1.5_7b test

a955b7d

TEST: add evaluate result to github

aa8a0bd

CI: update workflow docker

71022de

lvhan028 reviewed Sep 19, 2025

View reviewed changes

autotest/config.yaml Show resolved Hide resolved

lvhan028 reviewed Sep 19, 2025

View reviewed changes

lvhan028 requested a review from zhulinJulia24 September 19, 2025 04:03

lvhan028 reviewed Sep 19, 2025

View reviewed changes

autotest/evaluate/eval_config_chat.py Show resolved Hide resolved

lvhan028 reviewed Sep 19, 2025

View reviewed changes

littlegy added 2 commits September 19, 2025 14:16

TEST: update code based on comments

88a6836

Merge branch 'main' into api_eva

1d20999

zhulinJulia24 approved these changes Sep 19, 2025

View reviewed changes

littlegy added 2 commits September 19, 2025 16:42

TEST: update docker

c68f4d2

Merge branch 'InternLM:main' into api_eva

96b8c55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] add API evaluation test #3987

[CI] add API evaluation test #3987

Uh oh!

littlegy commented Sep 18, 2025

Uh oh!

lvhan028 Sep 19, 2025

Uh oh!

lvhan028 Sep 19, 2025

Uh oh!

Uh oh!

lvhan028 Sep 19, 2025

Uh oh!

Uh oh!

lvhan028 Sep 19, 2025

Uh oh!

lvhan028 Sep 19, 2025

Uh oh!

zhulinJulia24 left a comment

Uh oh!

Uh oh!

[CI] add API evaluation test #3987

Are you sure you want to change the base?

[CI] add API evaluation test #3987

Uh oh!

Conversation

littlegy commented Sep 18, 2025

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Uh oh!

lvhan028 Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

lvhan028 Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lvhan028 Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lvhan028 Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

lvhan028 Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

zhulinJulia24 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!