-
Notifications
You must be signed in to change notification settings - Fork 489
Open
Labels
Description
🐛 Describe the bug
When submitting batch requests via the OpenAI-compatible batch API in Aibrix, the batch job completes successfully, but the output contains only the input messages - no model-generated responses. Single requests return responses correctly.
Steps to Reproduce
- Deplyed a qwen2-5vl-7b-instruct (Qwen/Qwen2.5-VL-7B-Instruct) model. Inference is up and running. Route available through the Aibrix gateway.
Single requests will return the model response.
curl -v http://<aibrix-gateway-url>/v1/chat/completions \
-H "Authorization: Bearer <key>" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2-5vl-7b-instruct",
"messages":[
{"role":"system","content":"You are a helpful assistant."},
{"role":"user","content":"Tell me a fact about the number 1."}
],
"max_tokens": 100
}'
- Used Python SDK to run a batch request:
import json
import time
from openai import OpenAI
client = OpenAI(
base_url="http://<aibrix-gateway-url>/v1",
api_key="cloudlyte"
)
batch_requests = [
{
"custom_id": f"request-{i}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "qwen2-5vl-7b-instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": f"Tell me a fact about the number {i}."}
],
"max_tokens": 100
}
}
for i in range(1, 6)
]
with open("batch_requests.jsonl", "w") as f:
for request in batch_requests:
f.write(json.dumps(request) + "\n")
with open("batch_requests.jsonl", "rb") as f:
batch_file = client.files.create(file=f, purpose="batch")
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Batch ID: {batch.id}")
while batch.status not in ["completed", "failed", "expired", "cancelled"]:
time.sleep(10)
batch = client.batches.retrieve(batch.id)
print(f"Status: {batch.status}")
if batch.status == "completed":
print(f"Batch completed!")
print(f"Total requests: {batch.request_counts.total}")
print(f"Completed: {batch.request_counts.completed}")
print(f"Failed: {batch.request_counts.failed}")
# Step 5: Download results
output_file_id = batch.output_file_id
result_content = client.files.content(output_file_id)
# Save results
with open("batch_results.jsonl", "wb") as f:
f.write(result_content.content)
# Process results
with open("batch_results.jsonl", "r") as f:
for line in f:
result = json.loads(line)
custom_id = result["custom_id"]
content = result["response"]["body"]["choices"][0]["message"]["content"]
print(f"{custom_id}: {content}")
else:
print(f"Batch failed with status: {batch.status}")- In the results im not able to see the model response
cat batch_results.jsonl
{"id": "b02b9", "error": null, "response": {"status_code": 200, "request_id": "e431b496-262b-4bd7-aaee-5e780462aa89-0", "body": {"model": "qwen2-5vl-7b-instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a fact about the number 1."}], "max_tokens": 100}}, "custom_id": "request-1"}
{"id": "c81c0", "error": null, "response": {"status_code": 200, "request_id": "e431b496-262b-4bd7-aaee-5e780462aa89-1", "body": {"model": "qwen2-5vl-7b-instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a fact about the number 2."}], "max_tokens": 100}}, "custom_id": "request-2"}
{"id": "5c609", "error": null, "response": {"status_code": 200, "request_id": "e431b496-262b-4bd7-aaee-5e780462aa89-2", "body": {"model": "qwen2-5vl-7b-instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a fact about the number 3."}], "max_tokens": 100}}, "custom_id": "request-3"}
{"id": "3e59a", "error": null, "response": {"status_code": 200, "request_id": "e431b496-262b-4bd7-aaee-5e780462aa89-3", "body": {"model": "qwen2-5vl-7b-instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a fact about the number 4."}], "max_tokens": 100}}, "custom_id": "request-4"}
{"id": "25680", "error": null, "response": {"status_code": 200, "request_id": "e431b496-262b-4bd7-aaee-5e780462aa89-4", "body": {"model": "qwen2-5vl-7b-instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a fact about the number 5."}], "max_tokens": 100}}, "custom_id": "request-5"}The response is just echoing the request.
- Logs of 1 request in the batch
{"timestamp":"2025-11-25T06:57:44Z","logger":"aibrix.batch.storage.adapter","level":"debug","event":"Locked and will processing request in the job","job_id":"e431b496-262b-4bd7-aaee-5e780462aa89","request":{"custom_id":"request-1","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen2-5vl-7b-instruct","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Tell me a fact about the number 1."}],"max_tokens":100},"_request_index":0}},
{"timestamp":"2025-11-25T06:57:44Z","logger":"aibrix.batch.job_driver","level":"debug","event":"Executing job request","job_id":"e431b496-262b-4bd7-aaee-5e780462aa89","line":0,"request_id":0,"custom_id":"request-1"},
{"timestamp":"2025-11-25T06:57:45Z","logger":"aibrix.batch.job_driver","level":"debug","event":"Got request response","job_id":"e431b496-262b-4bd7-aaee-5e780462aa89","request_id":0,"custom_id":"request-1","response":{"id":"b02b9","error":null,"response":{"status_code":200,"request_id":"e431b496-262b-4bd7-aaee-5e780462aa89-0","body":{"model":"qwen2-5vl-7b-instruct","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Tell me a fact about the number 1."}],"max_tokens":100}}}},
Expected behavior
Each request in the batch should return model-generated responses.
Environment
- Aibrix version: v0.5.0
- Deployment environment: Kubernetes