Skip to content

Conversation

varungup90
Copy link
Collaborator

Pull Request Description

[Please provide a clear and concise description of your changes here]

Related Issues

Resolves: #[Insert issue number(s)]

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@varungup90 varungup90 marked this pull request as draft October 7, 2025 20:15
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @varungup90, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the routing algorithm for prefill and decode (P/D) pods within the gateway. It introduces a sophisticated scoring system that evaluates potential prefill and decode pods based on their current load, prefix cache efficiency, token throughput, and GPU utilization. The primary goal is to intelligently pair prefill and decode pods from the same roleset, optimizing resource distribution and improving overall request processing efficiency by mitigating load imbalances.

Highlights

  • Enhanced P/D Pod Selection Logic: Implemented a new algorithm to select prefill and decode pods by scoring them based on various metrics, ensuring both pods belong to the same roleset.
  • Load Imbalance Consideration: Introduced mechanisms to detect and address load imbalances for both prefill and decode pods, prioritizing less-loaded instances.
  • Metric-Based Scoring: Prefill pods are scored based on prefix cache matches and request counts, while decode pods are scored using running requests, token throughput, and free GPU usage.
  • New Configuration Parameters: Added environment variables (AIBRIX_DECODE_MAX_REQUEST, AIBRIX_DECODE_MAX_THROUGHPUT) to configure decode pod load balancing thresholds.
  • Utility Function for Pod Filtering: A new helper function FilterPodsByLabel was added to streamline pod selection based on Kubernetes labels.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring of the P/D (Prefill/Decode) pod selection logic. The new approach scores and selects prefill and decode pods from the same roleset together, considering factors like prefix cache match, request count, throughput, and GPU usage. While this is a valuable feature, the new complexity introduces several potential issues. My review has identified critical bugs such as potential nil pointer dereferences, a compilation error due to an undefined variable, and a high-severity issue with swapped function return values that would lead to incorrect behavior. I have also included suggestions to address potential division-by-zero errors and to improve code maintainability by refactoring functions with too many return values and replacing magic numbers with constants.

Comment on lines +164 to +177
var targetPrefillPod, targetDecodePod *v1.Pod
minScore := math.MaxFloat64
for roleset, prefillScore := range prefillScores {
decodeScore, ok := decodeScores[roleset]
if !ok {
continue
}

if prefillScore.Score+decodeScore.Score < minScore {
minScore = prefillScore.Score + decodeScore.Score
targetPrefillPod = prefillScore.Pod
targetDecodePod = decodeScore.Pod
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

After this loop, targetPrefillPod or targetDecodePod could be nil if prefillScores is empty or no matching rolesets are found in decodeScores. This will lead to a nil pointer dereference in the defer block on line 181 and when prefillPod is used in the Route function on line 101. You should add a check to ensure both pods have been selected.

var targetPrefillPod, targetDecodePod *v1.Pod
	minScore := math.MaxFloat64
	for roleset, prefillScore := range prefillScores {
		decodeScore, ok := decodeScores[roleset]
		if !ok {
			continue
		}

		if prefillScore.Score+decodeScore.Score < minScore {
			minScore = prefillScore.Score + decodeScore.Score
			targetPrefillPod = prefillScore.Pod
			targetDecodePod = decodeScore.Pod
		}
	}

	if targetPrefillPod == nil || targetDecodePod == nil {
		return nil, nil, fmt.Errorf("failed to select a pair of prefill and decode pods")
	}

for _, pod := range prefillPods {
rolesetName := pod.Labels[PDRoleSetIdentifier]
reqCnt := float64(podRequestCount[pod.Name])
if reqCnt > meanRequestCount+float64(standardDeviationFactor)*stdDevRequestCount {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The constant standardDeviationFactor is used here but is not defined in the file, which will cause a compilation error. Please define it in the const block at the top of the file. For example: const standardDeviationFactor = 2.0.

"prefill_pod", prefillPod.Name,
"decode_pod", decodePod.Name)
return decodePod
prefillScore := (100-float64(matchedPods[pod.Name]))*.1 + (reqCnt / maxRequestCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential for division by zero here. If all prefill pods are idle, getRequestCounts will return counts of 0 for all of them, making maxRequestCount equal to 0. This will cause a panic. You should guard against this.

if maxRequestCount == 0 {
			maxRequestCount = 1 // Avoid division by zero
		}
		prefillScore := (100-float64(matchedPods[pod.Name]))*.1 + (reqCnt / maxRequestCount)

"min_throughput", minThroughput, "max_throughput", maxThroughput,
"free_gpu_percent", podFreeGpuUsage[minRequestPod.Name],
"decode_pod", minRequestPod.Name)
return minRequestPod, maxRequestCount, maxFreeGPUUsage, maxThroughput, podRequestCounts, podThroughputs, podFreeGpuUsage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The order of the returned values maxFreeGPUUsage and maxThroughput is swapped compared to the function signature. The signature expects (..., maxThroughput, maxFreeGPUUsage, ...) but this returns (..., maxFreeGPUUsage, maxThroughput, ...). This will cause incorrect values to be used in the scoring logic.

return minRequestPod, maxRequestCount, maxThroughput, maxFreeGPUUsage, podRequestCounts, podThroughputs, podFreeGpuUsage

"min_throughput", minThroughput, "max_throughput", maxThroughput,
"free_gpu_percent", podFreeGpuUsage[minThroughputPod.Name],
"decode_pod", minThroughputPod.Name)
return minThroughputPod, maxRequestCount, maxFreeGPUUsage, maxThroughput, podRequestCounts, podThroughputs, podFreeGpuUsage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the other return path in this function, the order of the returned values maxFreeGPUUsage and maxThroughput is swapped compared to the function signature. This will lead to incorrect scoring calculations.

return minThroughputPod, maxRequestCount, maxThroughput, maxFreeGPUUsage, podRequestCounts, podThroughputs, podFreeGpuUsage

return SGLangBootstrapPort // Default port
}

func (r *pdRouter) loadImbalanceSelectDecodePod(ctx *types.RoutingContext, filteredDecodePods []*v1.Pod) (*v1.Pod, float64, float64, float64, map[string]float64, map[string]float64, map[string]float64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function returns 7 values, which can be difficult to manage and is considered a code smell in Go. Consider refactoring this to return a struct containing these values. This will improve readability and maintainability, and reduce the risk of errors like swapped return values.

}
podFreeGpuUsage[pod.Name] = 100 - gpuUsage.GetSimpleValue()*100
if podFreeGpuUsage[pod.Name] <= 0 {
podFreeGpuUsage[pod.Name] = 0.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The magic number 0.1 is used here to avoid potential division-by-zero issues later. It's better to define this as a named constant to improve code clarity and maintainability. For example: const minFreeGPUUsageEpsilon = 0.1.

podFreeGpuUsage[pod.Name] = 0.1 // TODO: use a const

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant