[Feature] Add Session-Aware Logic in Gateway #1633

SleepyLGod · 2025-10-07T13:32:16Z

Pull Request Description

Current AIBrix Gateway uses a prefix-cache-aware router to route single LLM requests.

This PR is responsible for implementing session tracking and session-aware scheduling (and routing) within the Gateway.

Related Issues

Important: Before submitting, please complete the description above and review the checklist below.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

… tests

…arsing for session ID

- Removed old session cache tests and implementation. - Added new `SessionCache` interface for session state management. - Implemented `MutexSessionCache` for low to medium concurrency using mutexes. - Implemented `ShardedSessionCache` for high concurrency using sharding and channels. - Created comprehensive test suite for both cache implementations, ensuring compliance with the `SessionCache` interface. - Added benchmarks to compare performance between mutex and sharded implementations. - Included cleanup routines to manage stale sessions effectively.

gemini-code-assist · 2025-10-07T13:32:44Z

Summary of Changes

Hello @SleepyLGod, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the AIBrix Gateway by integrating a sophisticated session-aware scheduling system. The core change involves transitioning from a basic prefix-cache router to a state machine-driven request processing flow, which leverages a new scheduler and session cache. This new architecture enables intelligent prioritization of requests based on session history, dynamic adjustment of processing capacity, and improved resource utilization, ultimately aiming to boost throughput and reduce latency for conversational AI applications.

Highlights

Session-Aware Scheduling: Introduced a new session-aware scheduling mechanism within the AIBrix Gateway, moving beyond simple prefix-cache routing to manage conversational AI workloads more effectively.
State Machine for Request Processing: Implemented a state machine (ProcessStateMachine) to manage the lifecycle of incoming requests, allowing for more granular control and integration with the new scheduler.
Dynamic Batching and Load Awareness: The new scheduler incorporates dynamic batching and load awareness, using real-time pod capacity and utilization metrics to optimize request dispatch and prevent system overload.
Priority Queue with Anti-Starvation: A priority queue based on Critical Path Service Time (CST) and total wait time, including anti-starvation logic, ensures fair and efficient processing of requests within sessions.
Headers-Only Session ID Extraction: Standardized session ID extraction to rely solely on request headers (x-session-id or X-Session-ID), deprecating body-based extraction for consistency and efficiency.
High-Performance Session Cache: Developed MutexSessionCache and ShardedSessionCache implementations for thread-safe, in-memory storage of session states, including pod affinity hints and cleanup routines.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant architectural improvement by adding a session-aware scheduler and refactoring the request processing logic into a state machine. The new scheduler is designed for high performance using non-blocking channels, and the session caching has both simple and advanced sharded implementations. The overall design is robust and well-tested. My review focuses on a critical issue with the legacy processing path which appears to be broken by the changes, a point where error handling for clients can be improved, and a minor issue with log levels.

pkg/plugins/gateway/gateway_req_body.go

pkg/plugins/gateway/gateway_state_machine.go

…gateway

…se when missing in headers

… handling

SleepyLGod added 15 commits July 30, 2025 16:16

[feat] Add Basic Session Cache Structure (vllm-project#633)

fe74623

Merge branch 'main' into feature/add-autellix-logic

b4377fe

[Fix] Enhance logic of sharded and mutex-based session cache with tests

ead1afa

[feat] initialize AIBrix scheduler plugin logic

8ac640b

Merge branch 'main' into feature/add-autellix-logic

7d22b0c

Merge branch 'main' into feature/add-autellix-logic

5445671

Merge branch 'vllm-project:main' into feature/add-autellix-logic

b6fc67f

feat: implement high-throughput lock-free scheduler and add benchmark…

b295e85

… tests

feat: integrate scheduler and state machine for request processing

c2404dc

feat: enhance scheduler with load awareness and batch size smoothing

25f4cd3

feat: integrate scheduler with cache and enhance state machine tests

24b3a32

Merge branch 'main' into feature/add-autellix-logic

7af94f2

feat: enhance pod watcher and scheduling logic with detailed filtering

10eb267

feat: implement headers-only session ID extraction and disable body p…

abc31d0

…arsing for session ID

gemini-code-assist bot reviewed Oct 7, 2025

View reviewed changes

pkg/plugins/gateway/gateway_req_body.go Outdated Show resolved Hide resolved

pkg/plugins/gateway/gateway_state_machine.go Show resolved Hide resolved

pkg/plugins/gateway/gateway_state_machine.go Outdated Show resolved Hide resolved

SleepyLGod marked this pull request as draft October 7, 2025 13:36

SleepyLGod added 3 commits October 8, 2025 14:57

feat: Add legacy processing mode support and update routing logic in …

c6d8906

…gateway

feat: Enhance session ID validation by sending 400 Bad Request respon…

6640936

…se when missing in headers

feat: Adjust logging levels for message processing and request header…

2f61b1a

… handling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Add Session-Aware Logic in Gateway #1633

[Feature] Add Session-Aware Logic in Gateway #1633

Uh oh!

SleepyLGod commented Oct 7, 2025

Uh oh!

gemini-code-assist bot commented Oct 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Feature] Add Session-Aware Logic in Gateway #1633

Are you sure you want to change the base?

[Feature] Add Session-Aware Logic in Gateway #1633

Uh oh!

Conversation

SleepyLGod commented Oct 7, 2025

Pull Request Description

Related Issues

Pull Request Title Format

Submission Checklist

Uh oh!

gemini-code-assist bot commented Oct 7, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant