docs: Adds Qwen3-235B-A22B recipes for both precisions #4179

Elnifio · 2025-11-07T07:35:35Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added deployment recipes for Qwen-3 models with support for disaggregated and aggregated inference modes
- Implemented persistent volume-based model caching mechanism
- Added automated model downloading from Hugging Face
- Introduced performance benchmarking capabilities for model evaluation

Signed-off-by: Elnifio <[email protected]>

copy-pr-bot · 2025-11-07T07:35:38Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-11-07T07:35:44Z

👋 Hi Elnifio! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

Signed-off-by: Elnifio <[email protected]>

coderabbitai · 2025-11-07T07:40:21Z

Walkthrough

Introduces Kubernetes manifests for deploying two Qwen model variants with disaggregated and aggregated inference strategies. Adds PersistentVolumeClaims for model caching, model download Jobs, DynamoGraphDeployment configurations with TensorRT-LLM backends, and aiperf benchmarking Jobs supporting both parallel prefill/decode and single-node inference patterns.

Changes

Cohort / File(s)	Summary
Model Cache Setup `recipes/qwen3-235b-a22b-instruct-2507-fp8/model-cache/model-cache.yaml`, `recipes/qwen3-235b-a22b/model-cache/model-cache.yaml`	New PersistentVolumeClaim manifests defining 300Gi storage with ReadWriteMany access mode and placeholder storageClassName.
Model Download Jobs `recipes/qwen3-235b-a22b-instruct-2507-fp8/model-cache/model-download.yaml`, `recipes/qwen3-235b-a22b/model-cache/model-download.yaml`	New Kubernetes Job manifests for downloading models from Hugging Face using hf_transfer, mounting persistent volumes at /model-store, with HF_TOKEN sourced from secrets.
Disaggregated Deployment `recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/deploy.yaml`	DynamoGraphDeployment with separate prefill and decode worker components, ConfigMaps for prefill.yaml and decode.yaml configurations, and shared model-cache PVC. Includes frontend router, TensorRT-LLM prefill/decode workers with GPU specifications and node affinity rules.
Aggregated Deployment `recipes/qwen3-235b-a22b/trtllm/agg/deploy.yaml`	DynamoGraphDeployment with single aggregated TrtllmWorker component and agg.yaml ConfigMap. Specifies 4 GPU allocations, pod anti-affinity, and model-cache mounting for unified inference.
Disaggregated Benchmarking `recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/perf.yaml`	Kubernetes Job running aiperf benchmarks with inline bash functions for model readiness polling, concurrency calculation, artifact management, and performance profiling against disaggregated endpoint.
Aggregated Benchmarking `recipes/qwen3-235b-a22b/trtllm/agg/perf.yaml`	Kubernetes Job for aiperf benchmarking with bash helper functions (wait_for_model_ready, run_perf), dynamic concurrency computation, input_config.json generation, and artifact collection for aggregated deployment.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

DynamoGraphDeployment resources: Verify correct component hierarchy, volumeMounting references, and environment variable mappings for both disaggregated and aggregated modes.
Prefill/decode configuration split: Ensure prefill.yaml and decode.yaml ConfigMap contents align with TensorRT-LLM disaggregation expectations and tensor parallel sizing.
Bash scripting in perf.yaml: Review wait_for_model_ready polling logic, aiperf argument construction, concurrency calculations (TOTAL_CONCURRENCY = CONCURRENCY_PER_GPU × DEPLOYMENT_GPU_COUNT), and input_config.json schema.
Volume and secret wiring: Validate PVC mounting paths, hf-token-secret envFrom sourcing, and storage class compatibility across both model variants.
Node affinity and pod anti-affinity: Confirm placement rules for prefill workers, decode workers, and benchmarking pods across cluster topology.

Poem

🐰 Two Qwens hop into Kubernetes with flair,
One splits its brains—prefill and decode with care,
The other stands tall, aggregated and strong,
With caches and benchmarks to carry along,
Models download, storage persists with grace,
Performance unfolds in each deployment's space!

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description contains only template placeholders with no substantive content, missing all required details about changes, implementation specifics, and related issues.	Fill in all sections with concrete details: provide an overview of the new recipes, describe the specific changes (e.g., FP8 and other precision variants), identify key files for review, and reference the actual GitHub issue number instead of the placeholder.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The title accurately describes the main change: adding Qwen3-235B-A22B recipes for both precisions (FP8 and aggregated configurations). It is concise and clearly reflects the primary purpose of the changeset.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

♻️ Duplicate comments (2)

recipes/qwen3-235b-a22b-instruct-2507-fp8/model-cache/model-download.yaml (1)
15-44: Add securityContext to restrict privilege escalation and enforce non-root execution.

Same security concern as the model-download job in recipes/qwen3-235b-a22b/model-cache/model-download.yaml. The container runs as root without explicit restrictions on privilege escalation (CKV_K8S_20, CKV_K8S_23).

Apply the same securityContext fix:
      spec:
        restartPolicy: Never
        containers:
          - name: model-download
            image: python:3.10-slim
            command: ["sh", "-c"]
+           securityContext:
+             allowPrivilegeEscalation: false
+             runAsNonRoot: true
+             runAsUser: 1000
            envFrom:
              - secretRef:
                  name: hf-token-secret
recipes/qwen3-235b-a22b/trtllm/agg/perf.yaml (1)
141-142: Eliminate privileged mode and run as non-root; justify if strictly necessary.

Same critical security concern as the disaggregated perf Job (file recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/perf.yaml). The container runs with privileged: true (CKV_K8S_16 – HIGH), which is a significant security risk.

Apply the same fix:
        securityContext:
-         privileged: true
+         allowPrivilegeEscalation: false
+         runAsNonRoot: true
+         runAsUser: 1000
+         capabilities:
+           drop:
+             - ALL
Verify aiperf functionality with these restrictions and adjust capabilities only if specific syscalls fail.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f509493 and 9dfa886.

📒 Files selected for processing (8)

recipes/qwen3-235b-a22b-instruct-2507-fp8/model-cache/model-cache.yaml (1 hunks)
recipes/qwen3-235b-a22b-instruct-2507-fp8/model-cache/model-download.yaml (1 hunks)
recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/deploy.yaml (1 hunks)
recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/perf.yaml (1 hunks)
recipes/qwen3-235b-a22b/model-cache/model-cache.yaml (1 hunks)
recipes/qwen3-235b-a22b/model-cache/model-download.yaml (1 hunks)
recipes/qwen3-235b-a22b/trtllm/agg/deploy.yaml (1 hunks)
recipes/qwen3-235b-a22b/trtllm/agg/perf.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

📚 Learning: 2025-10-24T04:21:08.751Z

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

Applied to files:

recipes/qwen3-235b-a22b/model-cache/model-cache.yaml
recipes/qwen3-235b-a22b/model-cache/model-download.yaml
recipes/qwen3-235b-a22b-instruct-2507-fp8/model-cache/model-cache.yaml
recipes/qwen3-235b-a22b-instruct-2507-fp8/model-cache/model-download.yaml

📚 Learning: 2025-09-04T19:03:06.643Z

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.

Applied to files:

recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/deploy.yaml
recipes/qwen3-235b-a22b/trtllm/agg/deploy.yaml

🪛 Checkov (3.2.334)

recipes/qwen3-235b-a22b/model-cache/model-download.yaml

[medium] 3-44: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[medium] 3-44: Minimize the admission of root containers

(CKV_K8S_23)

recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/perf.yaml

[medium] 3-153: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[high] 3-153: Container should not be privileged

(CKV_K8S_16)

[medium] 3-153: Minimize the admission of root containers

(CKV_K8S_23)

recipes/qwen3-235b-a22b-instruct-2507-fp8/model-cache/model-download.yaml

[medium] 3-44: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[medium] 3-44: Minimize the admission of root containers

(CKV_K8S_23)

recipes/qwen3-235b-a22b/trtllm/agg/perf.yaml

[medium] 3-153: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[high] 3-153: Container should not be privileged

(CKV_K8S_16)

[medium] 3-153: Minimize the admission of root containers

(CKV_K8S_23)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4179/merge) by Elnifio.

recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/deploy.yaml

[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook (trailing-whitespace) during pre-commit run.

recipes/qwen3-235b-a22b/trtllm/agg/deploy.yaml

[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook (trailing-whitespace) during pre-commit run.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (1)

recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/deploy.yaml (1)

8-59: Engine configuration strategy for disaggregation is well-tuned.

The prefill.yaml and decode.yaml configs show thoughtful tuning for the disaggregated inference pattern:

Prefill worker: Low batch size (2), tensor parallelism (2), optimized for throughput of short batches

Decode worker: Higher batch size (512), tensor parallelism (4), optimized for sustained token generation

GPU allocation aligns with workload characteristics. The fp8 dtype and memory settings are appropriate for the Qwen3-235B model size.

coderabbitai · 2025-11-07T07:40:24Z

recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/deploy.yaml

+spec:
+  backendFramework: trtllm
+  pvcs: 
+    - name: model-cache
+      create: false
+  services:
+    Frontend:
+      componentType: frontend
+      dynamoNamespace: qwen3-235b-a22b-disagg
+      replicas: 1
+      extraPodSpec:
+        affinity:
+          podAntiAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+            - labelSelector:
+                matchExpressions:
+                - key: nvidia.com/dynamo-graph-deployment-name
+                  operator: In
+                  values:
+                    - qwen3-235b-a22b-disagg-frontend
+              topologyKey: kubernetes.io/hostname
+        mainContainer:
+          image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag
+          args: 
+            - python3 -m dynamo.frontend --router-mode kv --http-port 8000
+          command: 
+            - /bin/sh
+            - -c
+    TRTLLMPrefillWorker:
+      componentType: worker
+      subComponentType: prefill
+      dynamoNamespace: qwen3-235b-a22b-disagg
+      envFromSecret: hf-token-secret
+      replicas: 1
+      resources:
+        limits:
+          gpu: "2"
+      sharedMemory:
+        size: 256Gi
+      extraPodSpec:
+        affinity:
+          nodeAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              nodeSelectorTerms:
+              - matchExpressions:
+                - key: nvidia.com/gpu.present
+                  operator: In
+                  values:
+                  - "true"
+        mainContainer:
+          env: 
+            - name: MODEL_PATH
+              value: /mnt/model-cache/hub/models--Qwen--Qwen3-235B-A22B-Instruct-2507-FP8/snapshots/e156cb4efae43fbee1a1ab073f946a1377e6b969
+            - name: ENGINE_ARGS
+              value: /engine_configs/prefill.yaml
+          image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag
+          workingDir: /workspace/components/backends/trtllm
+          command:
+            - /bin/sh
+            - -c
+          args:
+            - |
+              python3 -m dynamo.trtllm \
+                --model-path "${MODEL_PATH}" \
+                --served-model-name "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8" \
+                --extra-engine-args "${ENGINE_ARGS}" \
+                --disaggregation-mode prefill \
+                --disaggregation-strategy prefill_first
+          volumeMounts:
+            - name: prefill-config
+              mountPath: /engine_configs
+            - name: model-cache
+              mountPath: /mnt/model-cache
+        volumes:
+          - name: prefill-config
+            configMap:
+              name: prefill-config
+          - name: model-cache
+            persistentVolumeClaim:
+              claimName: model-cache
+    TRTLLMDecodeWorker:
+      componentType: worker
+      subComponentType: decode
+      dynamoNamespace: qwen3-235b-a22b-disagg
+      envFromSecret: hf-token-secret
+      replicas: 1
+      resources:
+        limits:
+          gpu: "4"
+      sharedMemory:
+        size: 256Gi
+      extraPodSpec:
+        affinity:
+          nodeAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              nodeSelectorTerms:
+              - matchExpressions:
+                - key: nvidia.com/gpu.present
+                  operator: In
+                  values:
+                  - "true"
+        mainContainer:
+          env: 
+            - name: MODEL_PATH
+              value: /mnt/model-cache/hub/models--Qwen--Qwen3-235B-A22B-Instruct-2507-FP8/snapshots/e156cb4efae43fbee1a1ab073f946a1377e6b969
+            - name: ENGINE_ARGS
+              value: /engine_configs/decode.yaml
+          image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag
+          workingDir: /workspace/components/backends/trtllm
+          command:
+            - /bin/sh
+            - -c
+          args:
+            - |
+              python3 -m dynamo.trtllm \
+                --model-path "${MODEL_PATH}" \
+                --served-model-name "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8" \
+                --extra-engine-args "${ENGINE_ARGS}" \
+                --disaggregation-mode decode \
+                --disaggregation-strategy prefill_first
+          volumeMounts:
+            - name: decode-config
+              mountPath: /engine_configs
+            - name: model-cache
+              mountPath: /mnt/model-cache
+        volumes:
+          - name: decode-config
+            configMap:
+              name: decode-config
+          - name: model-cache
+            persistentVolumeClaim:
+              claimName: model-cache


⚠️ Potential issue | 🟡 Minor

Fix trailing whitespace detected by pre-commit.

The pre-commit hook reported trailing whitespace. Re-run locally to verify:

pre-commit run --all-files

🤖 Prompt for AI Agents

In recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/deploy.yaml around lines 65 to 196 there are trailing whitespace characters that fail pre-commit; remove any trailing spaces at the ends of lines in that range (and any accidental blank-line spaces), save the file, re-run pre-commit (pre-commit run --all-files) to verify the fix, and commit the cleaned file.

coderabbitai · 2025-11-07T07:40:24Z

recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/deploy.yaml

+                    - qwen3-235b-a22b-disagg-frontend
+              topologyKey: kubernetes.io/hostname
+        mainContainer:
+          image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag


⚠️ Potential issue | 🔴 Critical

Replace placeholder image tags with a concrete version.

The image tag my-tag is a placeholder on lines 87, 120, and 172, causing ImagePullBackOff errors. Use an actual image version.

- image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag + image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:24.10

Update all three occurrences (Frontend, TRTLLMPrefillWorker, and TRTLLMDecodeWorker).

Also applies to: 120-120, 172-172

coderabbitai · 2025-11-07T07:40:24Z

recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/perf.yaml

+        securityContext:
+          privileged: true


⚠️ Potential issue | 🔴 Critical

Eliminate privileged mode and run as non-root; justify if strictly necessary.

The container runs with privileged: true (CKV_K8S_16 – HIGH severity), which grants full host access. This is a significant security risk for a benchmarking Job.

If privileged mode is required for aiperf monitoring, explicitly document why and consider using specific Linux capabilities instead. If not strictly necessary, remove privileged mode:

securityContext: - privileged: true + allowPrivilegeEscalation: false + runAsNonRoot: true + runAsUser: 1000 + capabilities: + drop: + - ALL

Verify aiperf functionality with these restrictions and adjust capabilities only if specific syscalls fail.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

securityContext:

privileged: true

securityContext:

allowPrivilegeEscalation: false

runAsNonRoot: true

runAsUser: 1000

capabilities:

drop:

- ALL

🤖 Prompt for AI Agents

In recipes/qwen3-235b-a22b-instruct-2507-fp8/trtllm/disagg/perf.yaml around lines 141-142, remove the broad securityContext: privileged: true and instead run the container as non-root and least-privileged: set securityContext.runAsNonRoot: true and a specific non-root user uid (e.g., runAsUser: 1000), drop all unnecessary capabilities and only add explicit ones needed for aiperf (addCapabilities: [CAP_NET_RAW] etc.) if specific syscalls fail, and document the exact reason and capability justification in a comment; only keep privileged: true after explicit justification and approval and verify aiperf works under the tightened security context.

coderabbitai · 2025-11-07T07:40:24Z

recipes/qwen3-235b-a22b/model-cache/model-download.yaml

+    spec:
+      restartPolicy: Never
+      containers:
+        - name: model-download
+          image: python:3.10-slim
+          command: ["sh", "-c"]
+          envFrom:
+            - secretRef:
+                name: hf-token-secret
+          env:
+            - name: MODEL_NAME
+              value: Qwen/Qwen3-235B-A22B
+            - name: HF_HOME
+              value: /model-store
+            - name: HF_HUB_ENABLE_HF_TRANSFER
+              value: "1"
+            - name: MODEL_REVISION
+              value: 8efa61729e24bd65b1d152b5ab5409052aa80e65
+          args:
+            - |
+              set -eux
+              pip install --no-cache-dir huggingface_hub hf_transfer
+              hf download $MODEL_NAME --revision $MODEL_REVISION
+          volumeMounts:
+            - name: model-cache
+              mountPath: /model-store
+      volumes:
+      - name: model-cache
+        persistentVolumeClaim:
+          claimName: model-cache


⚠️ Potential issue | 🟠 Major

Add securityContext to restrict privilege escalation and enforce non-root execution.

The container runs as root without explicit restrictions on privilege escalation. Per Checkov findings (CKV_K8S_20, CKV_K8S_23), add a securityContext to enforce security best practices.

Apply this diff to add a restrictive securityContext:

spec: restartPolicy: Never containers: - name: model-download image: python:3.10-slim command: ["sh", "-c"] + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + runAsUser: 1000 envFrom: - secretRef: name: hf-token-secret

Note: Ensure the container image supports running as user 1000, or adjust to an appropriate non-root UID.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

spec:

restartPolicy: Never

containers:

- name: model-download

image: python:3.10-slim

command: ["sh", "-c"]

envFrom:

- secretRef:

name: hf-token-secret

env:

- name: MODEL_NAME

value: Qwen/Qwen3-235B-A22B

- name: HF_HOME

value: /model-store

- name: HF_HUB_ENABLE_HF_TRANSFER

value: "1"

- name: MODEL_REVISION

value: 8efa61729e24bd65b1d152b5ab5409052aa80e65

args:

- |

set -eux

pip install --no-cache-dir huggingface_hub hf_transfer

hf download $MODEL_NAME --revision $MODEL_REVISION

volumeMounts:

- name: model-cache

mountPath: /model-store

volumes:

- name: model-cache

persistentVolumeClaim:

claimName: model-cache

spec:

restartPolicy: Never

containers:

- name: model-download

image: python:3.10-slim

command: ["sh", "-c"]

securityContext:

allowPrivilegeEscalation: false

runAsNonRoot: true

runAsUser: 1000

envFrom:

- secretRef:

name: hf-token-secret

env:

- name: MODEL_NAME

value: Qwen/Qwen3-235B-A22B

- name: HF_HOME

value: /model-store

- name: HF_HUB_ENABLE_HF_TRANSFER

value: "1"

- name: MODEL_REVISION

value: 8efa61729e24bd65b1d152b5ab5409052aa80e65

args:

- |

set -eux

pip install --no-cache-dir huggingface_hub hf_transfer

hf download $MODEL_NAME --revision $MODEL_REVISION

volumeMounts:

- name: model-cache

mountPath: /model-store

volumes:

- name: model-cache

persistentVolumeClaim:

claimName: model-cache

🧰 Tools

🪛 Checkov (3.2.334)

[medium] 3-44: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[medium] 3-44: Minimize the admission of root containers

(CKV_K8S_23)

coderabbitai · 2025-11-07T07:40:25Z

recipes/qwen3-235b-a22b/trtllm/agg/deploy.yaml

+                    - qwen3-235b-a22b-agg-frontend
+              topologyKey: kubernetes.io/hostname
+        mainContainer:
+          image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag


⚠️ Potential issue | 🔴 Critical

Replace placeholder image tag with a concrete version.

The image tag my-tag is a placeholder and will cause ImagePullBackOff errors at deployment time. Use an actual image version.

- image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag + image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:24.10

Verify the correct image tag with your registry or release notes and update both Frontend (line 58) and TrtllmWorker (line 95) containers.

Also applies to: 95-95

🤖 Prompt for AI Agents

In recipes/qwen3-235b-a22b/trtllm/agg/deploy.yaml around lines 58 and 95, the container image tag "my-tag" is a placeholder and must be replaced with a concrete, released image tag; update the image fields for the Frontend (line 58) and TrtllmWorker (line 95) containers to use the verified registry tag (e.g., a semver or digest) you confirmed from the NVIDIA registry or release notes, ensuring both occurrences match the exact published version.

coderabbitai · 2025-11-07T07:40:25Z

recipes/qwen3-235b-a22b/trtllm/agg/deploy.yaml

+      resources:
+        limits:
+          gpu: "4"
+        requests: 
+          gpu: "4"


⚠️ Potential issue | 🟡 Minor

Fix trailing whitespace detected by pre-commit.

The pre-commit hook reported trailing whitespace that was auto-fixed. Ensure no trailing spaces remain on lines with GPU resource limits and other specs.

Re-run pre-commit locally to verify all trailing whitespace is removed:

pre-commit run --all-files

🤖 Prompt for AI Agents

In recipes/qwen3-235b-a22b/trtllm/agg/deploy.yaml around lines 110 to 114, there are trailing whitespace characters on the GPU resource lines; remove any trailing spaces at the ends of the affected lines (e.g., after "gpu: \"4\"" and any other spec lines), save the file, and re-run pre-commit locally (pre-commit run --all-files) to verify all trailing whitespace warnings are resolved.

Signed-off-by: Elnifio <[email protected]>

Adds Qwen3-235B-A22B recipes for both precisions

9dfa886

Signed-off-by: Elnifio <[email protected]>

Elnifio requested review from a team as code owners November 7, 2025 07:35

pull-request-size bot added the size/XL label Nov 7, 2025

github-actions bot added the external-contribution Pull request is from an external contributor label Nov 7, 2025

moves to 235B-A22B-FP8 non-instruct

d242f78

Signed-off-by: Elnifio <[email protected]>

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

adds FP8 agg recipe

d289661

Signed-off-by: Elnifio <[email protected]>

Elnifio force-pushed the add-qwen3-235b-recipe branch from 1095b8f to d289661 Compare November 7, 2025 19:44

Elnifio changed the title ~~Adds Qwen3-235B-A22B recipes for both precisions~~ docs: Adds Qwen3-235B-A22B recipes for both precisions Nov 7, 2025

github-actions bot added the docs label Nov 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Adds Qwen3-235B-A22B recipes for both precisions #4179

docs: Adds Qwen3-235B-A22B recipes for both precisions #4179

Uh oh!

Elnifio commented Nov 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

coderabbitai bot commented Nov 7, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 7, 2025

Uh oh!

coderabbitai bot Nov 7, 2025

Uh oh!

coderabbitai bot Nov 7, 2025

Uh oh!

coderabbitai bot Nov 7, 2025

Uh oh!

coderabbitai bot Nov 7, 2025

Uh oh!

coderabbitai bot Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-        securityContext:
-          privileged: true
+        securityContext:
+          allowPrivilegeEscalation: false
+          runAsNonRoot: true
+          runAsUser: 1000
+          capabilities:
+            drop:
+              - ALL

docs: Adds Qwen3-235B-A22B recipes for both precisions #4179

Are you sure you want to change the base?

docs: Adds Qwen3-235B-A22B recipes for both precisions #4179

Uh oh!

Conversation

Elnifio commented Nov 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

coderabbitai bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Elnifio commented Nov 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2025 •

edited

Loading