[RFC]: OpenAI-compatible Batch API support

### Summary

To expose OpenAI-compatible batch API in a production-ready and k8s-native way, this RFC collects various deployment scenarios, defines batch system roles and their responsibilities, and showcases how roles can be combined to serve different deployment scenarios. The RFC overrides previous #182, and provides a detailed design of the underlying batch runtime.

### Motivation

### Deployment scenarios
After interviewing application developers, we found that the AIBrix can be deployed in various settings for batch purposes:

1. Batching using parallel LLM workers

<img height="100" alt="Image" src="https://github.com/user-attachments/assets/db7fea84-66ed-4067-ae58-5e68bc57af9b" />

The parallel LLM workers are created on demand and are dedicated to running batch inference tasks. This setting is suitable for:

- Running on spot resources.
- Batches of large size, containing a large number of inference tasks.
- Dedicated batch serving.

2. Batching using the existing LLM service.

<img height="100" alt="Image" src="https://github.com/user-attachments/assets/901a5c34-bc96-4d03-bae3-0c5b9bd8cdd8" />

The existing LLM service can be an external LLM service or an existing vLLM deployment, which can be independently scaled out of the batch system.

### Batch system roles
A batch system that supports both settings can be abstracted as:

<img width="2292" height="1238" alt="Image" src="https://github.com/user-attachments/assets/f8eee546-9dcc-4ab5-b2ce-fdf0b94b126c" />

The **Batch Job Entity** is responsible for utilizing k8s-native features, including:
1. Utilizing k8s-native informer for active batch job status notification.
2. (Optional) _Batch Tasks Workers_ launching and fault tolerance.

The **API Gateway** is responsible for offering the OpenAI Batch API interface and is responsible for:
1. Submit a batch job by creating the _Batch Job Entity_.
2. Forward other batch APIs to the _Batch Jobs Controller_.

The **Batch Job Controller** implements OpenAI Batch APIs and is responsible for:
1. Watch and cache _Batch Job Entity_ status updates.
2. Query the batch job list from the _Metadata Store_.
3. Query and cache the detailed job progress and statistics from the Metadata Store on receipt of _Batch Job Entity_ status updates
4. Schedule and dispatch batch jobs to _Batch Job Drivers_.
5. (Optional) Provision _Batch Job Drivers_ if drivers are deployed separately, including:
  - Scale _Batch Job Drivers_ based on batch job volume (reference: [Auto Scaling Microservices with Kubernetes Event-Driven Autoscaler (KEDA)](https://medium.com/cuddle-ai/auto-scaling-microservices-with-kubernetes-event-driven-autoscaler-keda-8db6c301b18)
  - Reschedule batch jobs on the failure of the _Batch Job Driver_.
6. (Optional) Scale _Batch Task Executors_.

The **Batch Job Driver** drives job progress by:
1. Read and parse the input file from I/O Storage, getting individual inference queries (Tasks) information.
2. (Optional) Schedule and map tasks to workers for coordinated scheduling, including task rescheduling on worker failure.
3. Collect outputs and write aggregated output to the I/O Storage.
4. Checkpoint job progress to the Metadata Store
5. Restore the job checkpoint from the Metadata Store on job retrial.
6. Update Batch Job Entity on job status change.

The **Batch Tasks Worker** handles individual tasks by:
1. Dispatch task to _Batch Task Executor_.
2. (Optional) Claim task ownership for work-stealing scheduling, including:
  - Read the task input from the shared input file from the I/O Storage.
  - Write the task output as an individual file to the I/O Storage.
  - Implement task execution idempotency.
  - Update task progress to the Metadata Store on task completion.
 
The **Batch Task Executor** does LLM inference.

### Example role mappings for various scenarios
1. Simple LLM workers

<img width="1646" height="1052" alt="Image" src="https://github.com/user-attachments/assets/04f23caa-578d-44cb-9270-3599ec0cbef1" />

2. Colocating with existing online LLM services

<img width="1548" height="1030" alt="Image" src="https://github.com/user-attachments/assets/aca58dae-97a2-478b-b8dc-cd0272663239" />

### Proposed Change

We will offer support as mentioned in this document, step by step, and offer different variations to fit different batch scenarios, including:
1. Modify the Gateway plugin to support the OpenAI Batch API interface, and:
  - Submit Batch Job Entity
  - Forward other OpenAI Batch APIs to Batch API Service
2. Add Batch API Service that extends API Gateway functionality. The Batch API Service will:
  - Implement OpenAI Batch APIs.
  - Disable/enable Batch Job Driver features.
3. Add the separate scalable Batch Job Driver service to support high-volume batch jobs.
4. The Batch Job Entity will support two variations for different settings, including:
  - Kubernetes Job
  - BatchJob CRD
5. Add Batch Tasks Worker to the LLM runtime to support work-stealing scheduling.

### PR Plan

PR1: Batch API Service, including Kubernetes Job with a dummy scalar tasks worker.
PR2: Review File uploads and downloads API (#344) and add support for S3-compatible storage.
PR3: Gateway plugin support for OpenAI Batch and File API interface.
PR4: Add Batch Tasks Worker to LLM runtime and integrate with Kubernetes Job.
PR5: Add BatchJob CRD support and separate Batch Job Driver service.

### Alternatives Considered

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: OpenAI-compatible Batch API support #1277

Summary

Motivation

Deployment scenarios

Batch system roles

Example role mappings for various scenarios

Proposed Change

PR Plan

Alternatives Considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: OpenAI-compatible Batch API support #1277

Description

Summary

Motivation

Deployment scenarios

Batch system roles

Example role mappings for various scenarios

Proposed Change

PR Plan

Alternatives Considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions