-
Notifications
You must be signed in to change notification settings - Fork 472
[feat]: Select and score P/D in same roleset #1634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: varungupta <[email protected]>
Summary of ChangesHello @varungup90, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the routing algorithm for prefill and decode (P/D) pods within the gateway. It introduces a sophisticated scoring system that evaluates potential prefill and decode pods based on their current load, prefix cache efficiency, token throughput, and GPU utilization. The primary goal is to intelligently pair prefill and decode pods from the same roleset, optimizing resource distribution and improving overall request processing efficiency by mitigating load imbalances. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant refactoring of the P/D (Prefill/Decode) pod selection logic. The new approach scores and selects prefill and decode pods from the same roleset together, considering factors like prefix cache match, request count, throughput, and GPU usage. While this is a valuable feature, the new complexity introduces several potential issues. My review has identified critical bugs such as potential nil pointer dereferences, a compilation error due to an undefined variable, and a high-severity issue with swapped function return values that would lead to incorrect behavior. I have also included suggestions to address potential division-by-zero errors and to improve code maintainability by refactoring functions with too many return values and replacing magic numbers with constants.
var targetPrefillPod, targetDecodePod *v1.Pod | ||
minScore := math.MaxFloat64 | ||
for roleset, prefillScore := range prefillScores { | ||
decodeScore, ok := decodeScores[roleset] | ||
if !ok { | ||
continue | ||
} | ||
|
||
if prefillScore.Score+decodeScore.Score < minScore { | ||
minScore = prefillScore.Score + decodeScore.Score | ||
targetPrefillPod = prefillScore.Pod | ||
targetDecodePod = decodeScore.Pod | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this loop, targetPrefillPod
or targetDecodePod
could be nil
if prefillScores
is empty or no matching rolesets are found in decodeScores
. This will lead to a nil pointer dereference in the defer
block on line 181 and when prefillPod
is used in the Route
function on line 101. You should add a check to ensure both pods have been selected.
var targetPrefillPod, targetDecodePod *v1.Pod
minScore := math.MaxFloat64
for roleset, prefillScore := range prefillScores {
decodeScore, ok := decodeScores[roleset]
if !ok {
continue
}
if prefillScore.Score+decodeScore.Score < minScore {
minScore = prefillScore.Score + decodeScore.Score
targetPrefillPod = prefillScore.Pod
targetDecodePod = decodeScore.Pod
}
}
if targetPrefillPod == nil || targetDecodePod == nil {
return nil, nil, fmt.Errorf("failed to select a pair of prefill and decode pods")
}
for _, pod := range prefillPods { | ||
rolesetName := pod.Labels[PDRoleSetIdentifier] | ||
reqCnt := float64(podRequestCount[pod.Name]) | ||
if reqCnt > meanRequestCount+float64(standardDeviationFactor)*stdDevRequestCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"prefill_pod", prefillPod.Name, | ||
"decode_pod", decodePod.Name) | ||
return decodePod | ||
prefillScore := (100-float64(matchedPods[pod.Name]))*.1 + (reqCnt / maxRequestCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential for division by zero here. If all prefill pods are idle, getRequestCounts
will return counts of 0 for all of them, making maxRequestCount
equal to 0. This will cause a panic. You should guard against this.
if maxRequestCount == 0 {
maxRequestCount = 1 // Avoid division by zero
}
prefillScore := (100-float64(matchedPods[pod.Name]))*.1 + (reqCnt / maxRequestCount)
"min_throughput", minThroughput, "max_throughput", maxThroughput, | ||
"free_gpu_percent", podFreeGpuUsage[minRequestPod.Name], | ||
"decode_pod", minRequestPod.Name) | ||
return minRequestPod, maxRequestCount, maxFreeGPUUsage, maxThroughput, podRequestCounts, podThroughputs, podFreeGpuUsage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of the returned values maxFreeGPUUsage
and maxThroughput
is swapped compared to the function signature. The signature expects (..., maxThroughput, maxFreeGPUUsage, ...)
but this returns (..., maxFreeGPUUsage, maxThroughput, ...)
. This will cause incorrect values to be used in the scoring logic.
return minRequestPod, maxRequestCount, maxThroughput, maxFreeGPUUsage, podRequestCounts, podThroughputs, podFreeGpuUsage
"min_throughput", minThroughput, "max_throughput", maxThroughput, | ||
"free_gpu_percent", podFreeGpuUsage[minThroughputPod.Name], | ||
"decode_pod", minThroughputPod.Name) | ||
return minThroughputPod, maxRequestCount, maxFreeGPUUsage, maxThroughput, podRequestCounts, podThroughputs, podFreeGpuUsage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the other return path in this function, the order of the returned values maxFreeGPUUsage
and maxThroughput
is swapped compared to the function signature. This will lead to incorrect scoring calculations.
return minThroughputPod, maxRequestCount, maxThroughput, maxFreeGPUUsage, podRequestCounts, podThroughputs, podFreeGpuUsage
return SGLangBootstrapPort // Default port | ||
} | ||
|
||
func (r *pdRouter) loadImbalanceSelectDecodePod(ctx *types.RoutingContext, filteredDecodePods []*v1.Pod) (*v1.Pod, float64, float64, float64, map[string]float64, map[string]float64, map[string]float64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
podFreeGpuUsage[pod.Name] = 100 - gpuUsage.GetSimpleValue()*100 | ||
if podFreeGpuUsage[pod.Name] <= 0 { | ||
podFreeGpuUsage[pod.Name] = 0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Description
[Please provide a clear and concise description of your changes here]
Related Issues
Resolves: #[Insert issue number(s)]
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]
: Corrections to existing functionality[CI]
: Changes to build process or CI pipeline[Docs]
: Updates or additions to documentation[API]
: Modifications to aibrix's API or interface[CLI]
: Changes or additions to the Command Line Interface[Misc]
: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.