Skip to content

Batch requests completed successfully but model responses are empty. #1802

@ASWINBABUKV

Description

@ASWINBABUKV

🐛 Describe the bug

When submitting batch requests via the OpenAI-compatible batch API in Aibrix, the batch job completes successfully, but the output contains only the input messages - no model-generated responses. Single requests return responses correctly.

Steps to Reproduce

  1. Deplyed a qwen2-5vl-7b-instruct (Qwen/Qwen2.5-VL-7B-Instruct) model. Inference is up and running. Route available through the Aibrix gateway.
    Single requests will return the model response.
curl -v http://<aibrix-gateway-url>/v1/chat/completions \
  -H "Authorization: Bearer <key>" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "qwen2-5vl-7b-instruct",
        "messages":[
          {"role":"system","content":"You are a helpful assistant."},
          {"role":"user","content":"Tell me a fact about the number 1."}
        ],
        "max_tokens": 100
      }'
  1. Used Python SDK to run a batch request:
import json
import time
from openai import OpenAI

client = OpenAI(
    base_url="http://<aibrix-gateway-url>/v1",
    api_key="cloudlyte"
)

batch_requests = [
    {
        "custom_id": f"request-{i}",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "qwen2-5vl-7b-instruct",
            "messages": [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": f"Tell me a fact about the number {i}."}
            ],
            "max_tokens": 100
        }
    }
    for i in range(1, 6)
]

with open("batch_requests.jsonl", "w") as f:
    for request in batch_requests:
        f.write(json.dumps(request) + "\n")

with open("batch_requests.jsonl", "rb") as f:
    batch_file = client.files.create(file=f, purpose="batch")

batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)
print(f"Batch ID: {batch.id}")
while batch.status not in ["completed", "failed", "expired", "cancelled"]:
    time.sleep(10)
    batch = client.batches.retrieve(batch.id)
    print(f"Status: {batch.status}")

if batch.status == "completed":
    print(f"Batch completed!")
    print(f"Total requests: {batch.request_counts.total}")
    print(f"Completed: {batch.request_counts.completed}")
    print(f"Failed: {batch.request_counts.failed}")

    # Step 5: Download results
    output_file_id = batch.output_file_id
    result_content = client.files.content(output_file_id)

    # Save results
    with open("batch_results.jsonl", "wb") as f:
        f.write(result_content.content)

    # Process results
    with open("batch_results.jsonl", "r") as f:
        for line in f:
            result = json.loads(line)
            custom_id = result["custom_id"]
            content = result["response"]["body"]["choices"][0]["message"]["content"]
            print(f"{custom_id}: {content}")
else:
    print(f"Batch failed with status: {batch.status}")
  1. In the results im not able to see the model response
cat batch_results.jsonl
{"id": "b02b9", "error": null, "response": {"status_code": 200, "request_id": "e431b496-262b-4bd7-aaee-5e780462aa89-0", "body": {"model": "qwen2-5vl-7b-instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a fact about the number 1."}], "max_tokens": 100}}, "custom_id": "request-1"}
{"id": "c81c0", "error": null, "response": {"status_code": 200, "request_id": "e431b496-262b-4bd7-aaee-5e780462aa89-1", "body": {"model": "qwen2-5vl-7b-instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a fact about the number 2."}], "max_tokens": 100}}, "custom_id": "request-2"}
{"id": "5c609", "error": null, "response": {"status_code": 200, "request_id": "e431b496-262b-4bd7-aaee-5e780462aa89-2", "body": {"model": "qwen2-5vl-7b-instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a fact about the number 3."}], "max_tokens": 100}}, "custom_id": "request-3"}
{"id": "3e59a", "error": null, "response": {"status_code": 200, "request_id": "e431b496-262b-4bd7-aaee-5e780462aa89-3", "body": {"model": "qwen2-5vl-7b-instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a fact about the number 4."}], "max_tokens": 100}}, "custom_id": "request-4"}
{"id": "25680", "error": null, "response": {"status_code": 200, "request_id": "e431b496-262b-4bd7-aaee-5e780462aa89-4", "body": {"model": "qwen2-5vl-7b-instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a fact about the number 5."}], "max_tokens": 100}}, "custom_id": "request-5"}

The response is just echoing the request.

  1. Logs of 1 request in the batch
{"timestamp":"2025-11-25T06:57:44Z","logger":"aibrix.batch.storage.adapter","level":"debug","event":"Locked and will processing request in the job","job_id":"e431b496-262b-4bd7-aaee-5e780462aa89","request":{"custom_id":"request-1","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen2-5vl-7b-instruct","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Tell me a fact about the number 1."}],"max_tokens":100},"_request_index":0}},
{"timestamp":"2025-11-25T06:57:44Z","logger":"aibrix.batch.job_driver","level":"debug","event":"Executing job request","job_id":"e431b496-262b-4bd7-aaee-5e780462aa89","line":0,"request_id":0,"custom_id":"request-1"},
{"timestamp":"2025-11-25T06:57:45Z","logger":"aibrix.batch.job_driver","level":"debug","event":"Got request response","job_id":"e431b496-262b-4bd7-aaee-5e780462aa89","request_id":0,"custom_id":"request-1","response":{"id":"b02b9","error":null,"response":{"status_code":200,"request_id":"e431b496-262b-4bd7-aaee-5e780462aa89-0","body":{"model":"qwen2-5vl-7b-instruct","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Tell me a fact about the number 1."}],"max_tokens":100}}}},

Expected behavior

Each request in the batch should return model-generated responses.

Environment

  • Aibrix version: v0.5.0
  • Deployment environment: Kubernetes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions