[feat]: Select and score P/D in same roleset #1634

varungup90 · 2025-10-07T20:15:50Z

Pull Request Description

[Please provide a clear and concise description of your changes here]

Related Issues

Resolves: #[Insert issue number(s)]

Important: Before submitting, please complete the description above and review the checklist below.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

Signed-off-by: varungupta <[email protected]>

gemini-code-assist · 2025-10-07T20:16:06Z

Summary of Changes

Hello @varungup90, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the routing algorithm for prefill and decode (P/D) pods within the gateway. It introduces a sophisticated scoring system that evaluates potential prefill and decode pods based on their current load, prefix cache efficiency, token throughput, and GPU utilization. The primary goal is to intelligently pair prefill and decode pods from the same roleset, optimizing resource distribution and improving overall request processing efficiency by mitigating load imbalances.

Highlights

Enhanced P/D Pod Selection Logic: Implemented a new algorithm to select prefill and decode pods by scoring them based on various metrics, ensuring both pods belong to the same roleset.
Load Imbalance Consideration: Introduced mechanisms to detect and address load imbalances for both prefill and decode pods, prioritizing less-loaded instances.
Metric-Based Scoring: Prefill pods are scored based on prefix cache matches and request counts, while decode pods are scored using running requests, token throughput, and free GPU usage.
New Configuration Parameters: Added environment variables (AIBRIX_DECODE_MAX_REQUEST, AIBRIX_DECODE_MAX_THROUGHPUT) to configure decode pod load balancing thresholds.
Utility Function for Pod Filtering: A new helper function FilterPodsByLabel was added to streamline pod selection based on Kubernetes labels.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant refactoring of the P/D (Prefill/Decode) pod selection logic. The new approach scores and selects prefill and decode pods from the same roleset together, considering factors like prefix cache match, request count, throughput, and GPU usage. While this is a valuable feature, the new complexity introduces several potential issues. My review has identified critical bugs such as potential nil pointer dereferences, a compilation error due to an undefined variable, and a high-severity issue with swapped function return values that would lead to incorrect behavior. I have also included suggestions to address potential division-by-zero errors and to improve code maintainability by refactoring functions with too many return values and replacing magic numbers with constants.

gemini-code-assist · 2025-10-07T20:17:48Z