-
Notifications
You must be signed in to change notification settings - Fork 470
[Misc] Batch API envoy integration fix, E2E verification, and document update #1671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Batch API envoy integration fix, E2E verification, and document update #1671
Conversation
…ntegration Signed-off-by: Jingyuan Zhang <[email protected]>
Signed-off-by: Jingyuan Zhang <[email protected]>
…openai_batch_envoy_integration Signed-off-by: Jingyuan Zhang <[email protected]> # Conflicts: # python/aibrix/aibrix/batch/README.md # python/aibrix/aibrix/batch/scheduler.py # python/aibrix/aibrix/metadata/app.py # python/aibrix/tests/batch/conftest.py # python/aibrix/tests/batch/test_e2e_openai_batch_api.py # python/aibrix/tests/batch/test_job_cache.py # python/aibrix/tests/metadata/test_app_integration.py
Signed-off-by: Jingyuan Zhang <[email protected]>
…openai_batch_envoy_integration Signed-off-by: Jingyuan Zhang <[email protected]>
Signed-off-by: Jingyuan Zhang <[email protected]>
Summary of ChangesHello @zhangjyr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request finalizes the integration of the Batch API by establishing the necessary Envoy routing for batch-related endpoints, introducing robust end-to-end tests to ensure the functionality of the batch processing workflow, and providing detailed documentation and configuration options for customizing batch job execution within Kubernetes. These changes collectively enable a fully functional and verifiable Batch API experience. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively completes the Batch API integration by adding Envoy configuration, E2E tests, and comprehensive documentation. The changes are well-structured and consistent across the repository. My review focuses on improving the robustness of documentation and test setup, enhancing security by adhering to the principle of least privilege in RBAC configurations, and modernizing asynchronous code patterns in the new E2E tests.
- metadata.yaml | ||
- redis.yaml | ||
|
||
configMapGenerator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need configmap here? I see python folder have many skeleton templates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The job template patch has to be mapped to the container folder to take effect. This configmap can achieve this without rebuilding the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then should we remove file based templates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am glad to hear the options. Can you elaborate on how users might customize the job template?
The bottom line is users have to change the k8s_job_template.yaml
and rebuild the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template should be managed by operation folks. We can have one template now, leave the flexibility support to later . Once we get more feedback, we can start to work on it
config/metadata/metadata.yaml
Outdated
resources: ["jobs"] | ||
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] | ||
# For batch job ServiceAccount management | ||
- apiGroups: [""] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use a fix service account, along with aibrix installation? instead of reply on permission here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally fixed. I later found that the service account must be under the default namespace, while the kustomize overrides the namespace and puts everything in the aibrix-system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
em. we can put the file under a separate config/manifest folder without aibrix-system override.
in helm, I think we can support it as well, something like
namespace: {{ .Values.job.namespace }}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it works. How is kustomize
going to load config/manifest without overriding the namespace? I can have a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can ask user to follow a guidance to create a RBAC in short term, we do not want the controller to have such high permission.
- name: metadata-service | ||
image: {{ .Values.metadata.service.container.image.repository }}:{{ .Values.metadata.service.container.image.tag }} | ||
imagePullPolicy: {{ .Values.metadata.service.container.image.imagePullPolicy | default "IfNotPresent" }} | ||
command: ["python", "-m", "aibrix.metadata.app"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep the explicit commands here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I can add an explicit command, but it would be aibrix_metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's ok. user sometimes need to adjust parameters, if there's no config here, it automatically use container entrypoints, and user do not know it unless they know the commands in Dockerfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found aibrix_metadata
does not work in the helm. I have to change to
["python", "-m", "aibrix.metadata.app"].
One more thing, in the config, I disabled k8s-job by default, so users can run other metadata services (v1/models) if they do not configure external object storage (in this case, the k8s job will not work anyway). Do I need to apply the same logic in chart, too? This means users have to adjust values to enable object storage and k8s-job.
Cleanup files Signed-off-by: Jingyuan Zhang <[email protected]>
Signed-off-by: Jingyuan Zhang <[email protected]>
Signed-off-by: Jingyuan Zhang <[email protected]>
Signed-off-by: Jingyuan Zhang <[email protected]>
…t store to work. Signed-off-by: Jingyuan Zhang <[email protected]>
Signed-off-by: Jingyuan Zhang <[email protected]>
@Jeffwan Do you have any idea why the helm chart test failed? It looks the metadata healthz and readyz endpoints not accessible. I ran the same chart-testing locally and it all fine. Does it has anything to do with the image problem you saw in the metadata service migration PR? |
@zhangjyr the problem is the helm doesn't use the images build from this branch, it just use nightly image. this is a known issue and we should fix it, but we can ignore the problem at this moment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm. I still have some questions on the permission
…ig/job is create for user to deploy job rbac. Signed-off-by: Jingyuan Zhang <[email protected]>
Pull Request Description
This PR completes the last piece of the Batch API integration. Specifically:
Note that the k8s job is disabled by default. Config object storage (S3 or TOS) to enable it.
The default behavior is to upload files to the metadata-local and running job in the dummy engine, which is only for testing purposes.
Related Issues
Resolves: #1277
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]
: Corrections to existing functionality[CI]
: Changes to build process or CI pipeline[Docs]
: Updates or additions to documentation[API]
: Modifications to aibrix's API or interface[CLI]
: Changes or additions to the Command Line Interface[Misc]
: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.