-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Labels
Description
This issue tracks the status of OpenAI-compatible chat completions support in the gRPC router.
Basic Requirements
- Request Validation
- Status: validate.rs not integrated into gRPC router
- Need design discussion on validation approach (protobuf vs business logic level)
- Chat Message Processing
- router-grpc: Support jinja chat template content format detection #10832
- OpenAI → SGLang message conversion based on chat template format
- [router][grpc] Refactor chat template content format detection #11288
- Full support of apply_chat_template
- Template application with tools, template kwargs passing
- Template loading in tokenizer module
- [router][grpc] Add dependencies in Cargo.toml to support chat template rendering #11342
- Handle request.n > 1 in gRPC
- router: Support parallel sampling num > 1 in grpc_server and non-stream handling #10929
- [router][grpc] Fix request_id extraction when n > 1 #11311
- Two-phase approach: prefix caching + parallel generation, protobuf updates
- Detailed design ➡️ here
- Streaming Response from gRPC client
- [router][grpc] Support streaming for v1/chat/completions #11179
- [router][grpc] Refine streaming processes #11277
- Handle tool choice
"required"
orToolChoice
parsing
- Non-streaming Response from gRPC client
- Reasoning Parser
- Tool Constraint Generation and Handling
- Tool Call Parser should keep normal texts
- Split JsonParser and LlamaParser
- Parse function calling requests, generate structured output constraints (json-schema)
- Streaming states management and multiple tool calls
- [router][tool call] Clean up redundant
detect_format
andhas_tool_markers
#11270
- Logprobs Support
- Add logprobs extraction in Python gRPC server
- Handle logprobs conversion in Rust gRPC client
- [router][grpc] Add logprobs support to router #11082
- [router][bugfix] Fix input_logprobs handling with None value and
logprob_start_len = -1
#11113
- Protobuf Schema Review
- Validate optional vs required fields
- e.g. Fields like
top_k
should beoptional
as the default value0
is invalid. - [router][grpc] Fix proto3 default value mismatches and cleanup unused fields #11283
- gRPC Connection Management
- Client connection pooling in worker, failure handling, health monitoring
- [router] add move grpc worker management from router to worker manager #10960
- [router] move grpc client from router to worker and builder #10958
- Pipeline Management
- Establish a pipeline for chat
- [router][grpc] Refactor chat handler in grpc/ to use centralized orchestrator #11314
- E2E Testing
- Complete chat completion flows, OpenAI compatibility, performance benchmarks
- Make sure all errors are properly handled
- [router][grpc] Fix sampling_params.stop_strs is None #11306
Advanced Features
- Log Requests in GrpcRequestManager
- Metrics in GrpcRequestManager
- Hiddenstates in GenerateResponse
slin1237 and tonyluj