-
Notifications
You must be signed in to change notification settings - Fork 379
add serviceMonitor resource for helm #1255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: calvin0327 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @calvin0327, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've implemented a feature to simplify the integration of Prometheus monitoring with the Hami Helm chart. This pull request introduces dedicated ServiceMonitor resources for both the device plugin and scheduler components, making it straightforward to enable metric scraping through simple Helm value configurations. This change significantly improves the observability of these components within a Kubernetes environment.
Highlights
- Device Plugin ServiceMonitor: I introduced a new Kubernetes ServiceMonitor resource template for the
hami-device-plugin
component. This allows Prometheus to automatically discover and scrape metrics from the device plugin, enhancing observability. - Scheduler ServiceMonitor: I added a new Kubernetes ServiceMonitor resource template for the
hami-scheduler
component. Similar to the device plugin, this enables Prometheus to scrape metrics from the scheduler, providing better insights into its operation. - Helm Chart Configuration Options: I updated the
values.yaml
file to include newserviceMonitor.enabled
flags under both thescheduler
anddevicePlugin
sections. These flags, defaulting tofalse
, allow users to easily enable or disable the creation of these ServiceMonitor resources via Helm chart values, such as--set scheduler.serviceMonitor.enabled=true
.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces ServiceMonitor
resources for the device plugin and scheduler, which is a great feature for observability. However, there are a few issues that need to be addressed. The primary issue is that the new resources are guarded by an incorrect and undefined value (.Values.enableGateway
) instead of the newly introduced flags in values.yaml
. Additionally, for better consistency and flexibility, the standard chart labels should be included in the ServiceMonitor
metadata, and the bearerTokenSecret
should be made configurable. Addressing these points will make the new feature robust and easy to use.
d22955c
to
a44d2c1
Compare
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces ServiceMonitor
resources for both the scheduler and device-plugin components, which is a valuable addition for enabling Prometheus-based monitoring. The implementation is a good start, but I've identified several areas for improvement to align with Helm best practices. My feedback focuses on enhancing configurability by avoiding hardcoded values, ensuring resource consistency by including standard Helm labels, and improving the robustness of service discovery with more specific selectors. These changes will make the chart more flexible and easier to manage for users.
a44d2c1
to
3a410a1
Compare
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces ServiceMonitor resources for the hami-vgpu scheduler and device-plugin. The changes are well-structured, but there are opportunities to improve flexibility and correctness by making the Prometheus release label and scrape interval configurable, correcting the metric relabeling logic for the device-plugin, and ensuring the device-plugin service exposes the monitorport.
3a410a1
to
19022bb
Compare
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces ServiceMonitor
resources for the scheduler and device-plugin components, allowing for easier integration with Prometheus for monitoring. The implementation is straightforward, adding new templates and corresponding values.
My review has identified a few areas for improvement:
- Critical: The
ServiceMonitor
selectors are likely incorrect as they don't include the standard chart labels, which will prevent them from discovering the target services. - High: The port names in the
ServiceMonitor
endpoints are hardcoded, creating a fragile dependency on theService
definitions. - Medium: The
jobLabel
is too generic, and there's a minor formatting issue with a missing newline at the end of a template file.
Addressing these points will significantly improve the robustness and correctness of this new feature.
Signed-off-by: calvin chen <[email protected]>
19022bb
to
4e77e8c
Compare
What type of PR is this?
/kind feature
What this PR does / why we need it:
we can very easy to apply serviceMonitor resource for device plugin and scheduler by
--set scheduler.serviceMonitor.enabled=true
and--set devicePlugin.serviceMonitor.enabled=true
.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: