-
Notifications
You must be signed in to change notification settings - Fork 471
[Feat] Support StormService pause rollout in upgrade #1536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @Jeffwan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the StormService upgrade mechanism by introducing a robust canary deployment strategy. This allows for controlled, phased rollouts of new versions, minimizing risk by gradually exposing changes to a subset of the environment before full deployment. The new capabilities include flexible pausing and weight-based traffic management.
Highlights
- Canary Deployment Feature: Introduces comprehensive canary deployment capabilities for StormService upgrades, enabling gradual rollouts with defined steps for weight-based traffic shifting and configurable pauses.
- Flexible Pause Mechanisms: Implements both time-based automatic pauses and manual pauses that require explicit user intervention to resume, offering fine-grained control during staged rollouts.
- API and CRD Extensions: Extends the StormService API with new CanaryUpdateStrategy and CanaryStatus fields, along with supporting types like CanaryStep, PauseStep, CanaryPhase, and PauseCondition, fully integrated via CRD updates and client-side apply configurations.
- Enhanced Test Coverage: Adds extensive unit, integration, and end-to-end tests specifically for the new canary deployment logic, ensuring robustness and reliability of the feature.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant new feature: support for canary deployments in StormService
upgrades. It adds new API fields under updateStrategy.canary
to define canary steps, including setting weights and pausing. New controller logic is added to process these canary steps, along with corresponding status fields to track progress. The changes also include generated client code, CRD updates, unit tests, integration tests, and E2E tests for the new functionality.
While the overall structure and API design are sound, there is a critical issue: the core logic to apply the canary weight is not implemented. The functions responsible for adjusting the replica distribution are currently stubs. Additionally, the E2E tests are not comprehensive enough as they don't verify the actual workload state during the rollout. There is also a minor issue with a non-English comment in the code.
/cc |
b21cf6c
to
7546fcc
Compare
b365fff
to
7a5ae34
Compare
// Step 4: Clear canary status - this triggers normal rollout logic to take over | ||
stormService.Status.CanaryStatus = nil | ||
|
||
if err := r.Status().Patch(ctx, stormService, client.MergeFrom(original)); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r.applyCanaryStatusUpdate
has updated the status once, and we will patch it one last time. Will this cause the previous update to be missed?
err := r.Status().Patch(ctx, stormService, client.MergeFrom(original))
we use original for merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, this has not been refactored yet. the abort capability is not finished yet. I should clean this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed at this moment, abort will be added back in future PRs
// Emit a consistent CanaryUpdate event even if the pause condition already exists | ||
update := newCanaryStatusUpdate(). | ||
addEvent("Canary paused at manual pause step. Remove CanaryPauseStep pause condition to continue") | ||
if err := r.applyCanaryStatusUpdate(ctx, stormService, update); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there a lot of repetitive events here? because we will request after 30 sec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I do notice the issue. I should remove some unhelpful events
@googs1025 I will address the comments tomorrow. it's a little bit busy last week to work on this issue. |
4bbf97b
to
e59a44b
Compare
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive canary deployment feature for StormService, including support for weighted steps and both automatic and manual pauses. The implementation is extensive, with new API types, controller logic, and thorough unit and integration tests. The core logic for progressing through canary steps, handling pauses, and calculating replica distribution is well-structured. However, I've identified a critical bug in the scaling logic that could cause a panic, and the API design for resuming manual pauses is unconventional and should be revised to follow Kubernetes best practices. There are also some minor code style and consistency issues. Overall, this is a significant and valuable feature addition that will be even better with these fixes.
* Update stormservice golang client * Improve the test coverage * Refactor the API to support manual resume * improve the canary features * Leave e2e test to future PRs * fix lint and verify issues * Polish the canary status * Simplify the canary status fields * Final patch Signed-off-by: Jiaxin Shan <[email protected]>
e59a44b
to
ff769e2
Compare
will review today |
Pull Request Description
[Feat] Support StormService pause rollout in upgrade
Related Issues
Resolves: #1291
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]
: Corrections to existing functionality[CI]
: Changes to build process or CI pipeline[Docs]
: Updates or additions to documentation[API]
: Modifications to aibrix's API or interface[CLI]
: Changes or additions to the Command Line Interface[Misc]
: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.