-
Notifications
You must be signed in to change notification settings - Fork 471
[Misc] Batch API envoy integration fix, E2E verification, and document update #1671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
b1276c7
375a42b
2b4eb8d
cf173d8
cd1c8e7
91fa1f6
1d3d8e5
d328454
c35aa2b
36b01d2
c59b1cb
d5d247c
9711d67
23bfb48
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
apiVersion: batch/v1 | ||
kind: Job | ||
metadata: | ||
name: batch-job-template | ||
namespace: default | ||
spec: | ||
parallelism: 1 # Customizable. The number of parallel workers. | ||
completions: 1 # Customizable. Must equal to the parallelism. | ||
backoffLimit: 2 # Customizable, but usually no need to change. | ||
template: | ||
spec: | ||
containers: | ||
- name: batch-worker | ||
image: aibrix/runtime:nightly # Customizable, runtime image | ||
- name: llm-engine | ||
image: aibrix/vllm-mock:nightly # Customizable, LLM engine image |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,12 @@ resources: | |
- metadata.yaml | ||
- redis.yaml | ||
|
||
configMapGenerator: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need configmap here? I see python folder have many skeleton templates There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The job template patch has to be mapped to the container folder to take effect. This configmap can achieve this without rebuilding the image. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. then should we remove file based templates? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am glad to hear the options. Can you elaborate on how users might customize the job template? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. template should be managed by operation folks. We can have one template now, leave the flexibility support to later . Once we get more feedback, we can start to work on it |
||
- name: metadata-config | ||
namespace: aibrix-system | ||
files: | ||
- job_template_patch.yaml | ||
|
||
labels: | ||
- pairs: | ||
app.kubernetes.io/component: aibrix-metadata-service |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,6 +30,30 @@ rules: | |
- apiGroups: ["model.aibrix.ai"] | ||
resources: ["modeladapters"] | ||
verbs: ["get", "list"] | ||
# For batch job watching | ||
- apiGroups: ["batch"] | ||
resources: ["jobs"] | ||
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] | ||
# For batch job ServiceAccount management | ||
- apiGroups: [""] | ||
|
||
resources: ["serviceaccounts"] | ||
verbs: ["get", "create", "update", "patch", "delete"] | ||
- apiGroups: ["rbac.authorization.k8s.io"] # for Role management | ||
resources: ["roles"] | ||
verbs: ["get", "create", "update", "patch", "delete"] | ||
- apiGroups: ["rbac.authorization.k8s.io"] # for RoleBinding management | ||
resources: ["rolebindings"] | ||
verbs: ["get", "create", "update", "patch", "delete"] | ||
zhangjyr marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
# For kopf high availability | ||
- apiGroups: ["coordination.k8s.io"] | ||
resources: ["leases"] | ||
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] | ||
- apiGroups: ["apiextensions.k8s.io"] # required by kopf | ||
resources: ["customresourcedefinitions"] | ||
verbs: ["get", "list", "watch"] | ||
- apiGroups: [""] # required by kopf | ||
resources: ["namespaces"] | ||
verbs: ["list", "watch"] | ||
--- | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRoleBinding | ||
|
@@ -64,14 +88,27 @@ spec: | |
- name: init-redis | ||
image: busybox | ||
command: ['sh', '-c', 'until echo "ping" | nc aibrix-redis-master 6379 -w 1 | grep -c PONG; do echo waiting for redis; sleep 2; done'] | ||
volumes: | ||
- name: config-volume | ||
configMap: | ||
name: metadata-config | ||
containers: | ||
- name: metadata-service | ||
image: metadata-service:latest | ||
imagePullPolicy: IfNotPresent | ||
command: ["python", "-m", "aibrix.metadata.app"] | ||
args: ["--host=0.0.0.0", "--port=8090"] | ||
command: | ||
- aibrix_metadata | ||
- --host | ||
- "0.0.0.0" | ||
- --enable-k8s-job | ||
- --k8s-job-patch | ||
- /app/config/job_template_patch.yaml | ||
ports: | ||
- containerPort: 8090 | ||
volumeMounts: | ||
- name: config-volume | ||
mountPath: /app/config | ||
readOnly: true | ||
resources: | ||
limits: | ||
cpu: 500m | ||
|
@@ -92,6 +129,54 @@ spec: | |
valueFrom: | ||
fieldRef: | ||
fieldPath: metadata.namespace | ||
# Object store configuration | ||
# Comment the following lines to disable S3 as the object store | ||
- name: STORAGE_AWS_ACCESS_KEY_ID | ||
valueFrom: | ||
secretKeyRef: | ||
name: aibrix-s3-credentials | ||
key: access-key-id | ||
- name: STORAGE_AWS_SECRET_ACCESS_KEY | ||
valueFrom: | ||
secretKeyRef: | ||
name: aibrix-s3-credentials | ||
key: secret-access-key | ||
- name: STORAGE_AWS_REGION | ||
valueFrom: | ||
secretKeyRef: | ||
name: aibrix-s3-credentials | ||
key: region | ||
- name: STORAGE_AWS_BUCKET | ||
valueFrom: | ||
secretKeyRef: | ||
name: aibrix-s3-credentials | ||
key: bucket-name | ||
# Uncomment the following lines to enable TOS as the object store | ||
# - name: STORAGE_TOS_ACCESS_KEY | ||
# valueFrom: | ||
# secretKeyRef: | ||
# name: aibrix-tos-credentials | ||
# key: access-key | ||
# - name: STORAGE_TOS_SECRET_KEY | ||
# valueFrom: | ||
# secretKeyRef: | ||
# name: aibrix-tos-credentials | ||
# key: secret-key | ||
# - name: STORAGE_TOS_ENDPOINT | ||
# valueFrom: | ||
# secretKeyRef: | ||
# name: aibrix-tos-credentials | ||
# key: endpoint | ||
# - name: STORAGE_TOS_REGION | ||
# valueFrom: | ||
# secretKeyRef: | ||
# name: aibrix-tos-credentials | ||
# key: region | ||
# - name: STORAGE_TOS_BUCKET | ||
# valueFrom: | ||
# secretKeyRef: | ||
# name: aibrix-tos-credentials | ||
# key: bucket-name | ||
livenessProbe: | ||
httpGet: | ||
path: /healthz | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,8 +31,6 @@ spec: | |
- name: metadata-service | ||
image: {{ .Values.metadata.service.container.image.repository }}:{{ .Values.metadata.service.container.image.tag }} | ||
imagePullPolicy: {{ .Values.metadata.service.container.image.imagePullPolicy | default "IfNotPresent" }} | ||
command: ["python", "-m", "aibrix.metadata.app"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's keep the explicit commands here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I can add an explicit command, but it would be aibrix_metadata There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that's ok. user sometimes need to adjust parameters, if there's no config here, it automatically use container entrypoints, and user do not know it unless they know the commands in Dockerfile There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found |
||
args: ["--host=0.0.0.0", "--port=8090"] | ||
ports: | ||
- containerPort: 8090 | ||
resources: {{ toYaml .Values.metadata.service.container.resources | nindent 12 }} | ||
|
Uh oh!
There was an error while loading. Please reload this page.