Skip to content

Conversation

SleepyLGod
Copy link

Pull Request Description

Current AIBrix Gateway uses a prefix-cache-aware router to route single LLM requests.

This PR is responsible for implementing session tracking and session-aware scheduling (and routing) within the Gateway.

Related Issues

Resolves: #633 #1248

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

- Removed old session cache tests and implementation.
- Added new `SessionCache` interface for session state management.
- Implemented `MutexSessionCache` for low to medium concurrency using mutexes.
- Implemented `ShardedSessionCache` for high concurrency using sharding and channels.
- Created comprehensive test suite for both cache implementations, ensuring compliance with the `SessionCache` interface.
- Added benchmarks to compare performance between mutex and sharded implementations.
- Included cleanup routines to manage stale sessions effectively.
Copy link
Contributor

Summary of Changes

Hello @SleepyLGod, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the AIBrix Gateway by integrating a sophisticated session-aware scheduling system. The core change involves transitioning from a basic prefix-cache router to a state machine-driven request processing flow, which leverages a new scheduler and session cache. This new architecture enables intelligent prioritization of requests based on session history, dynamic adjustment of processing capacity, and improved resource utilization, ultimately aiming to boost throughput and reduce latency for conversational AI applications.

Highlights

  • Session-Aware Scheduling: Introduced a new session-aware scheduling mechanism within the AIBrix Gateway, moving beyond simple prefix-cache routing to manage conversational AI workloads more effectively.
  • State Machine for Request Processing: Implemented a state machine (ProcessStateMachine) to manage the lifecycle of incoming requests, allowing for more granular control and integration with the new scheduler.
  • Dynamic Batching and Load Awareness: The new scheduler incorporates dynamic batching and load awareness, using real-time pod capacity and utilization metrics to optimize request dispatch and prevent system overload.
  • Priority Queue with Anti-Starvation: A priority queue based on Critical Path Service Time (CST) and total wait time, including anti-starvation logic, ensures fair and efficient processing of requests within sessions.
  • Headers-Only Session ID Extraction: Standardized session ID extraction to rely solely on request headers (x-session-id or X-Session-ID), deprecating body-based extraction for consistency and efficiency.
  • High-Performance Session Cache: Developed MutexSessionCache and ShardedSessionCache implementations for thread-safe, in-memory storage of session states, including pod affinity hints and cleanup routines.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant architectural improvement by adding a session-aware scheduler and refactoring the request processing logic into a state machine. The new scheduler is designed for high performance using non-blocking channels, and the session caching has both simple and advanced sharded implementations. The overall design is robust and well-tested. My review focuses on a critical issue with the legacy processing path which appears to be broken by the changes, a point where error handling for clients can be improved, and a minor issue with log levels.

@SleepyLGod SleepyLGod marked this pull request as draft October 7, 2025 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support session tracking for LLM request

1 participant