Skip to content

Conversation

@luohua13
Copy link
Contributor

@luohua13 luohua13 commented Dec 29, 2025

Summary by CodeRabbit

  • Documentation
    • Added comprehensive Kueue documentation covering installation, configuration, quota management, monitoring, RBAC, cohorts/fair sharing, gang scheduling, and usage examples with commands
    • Updated site ordering by adding page weight metadata to improve navigation for device management content

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 29, 2025

Walkthrough

This PR adds YAML front matter to two existing device-management docs and introduces a new comprehensive Kueue documentation page covering installation, RBAC, quotas (ClusterQueue/ResourceFlavor/LocalQueue), monitoring pending workloads, priority/fairness, permissions, and gang scheduling.

Changes

Cohort / File(s) Summary
Device Management front matter
docs/en/infrastructure_management/device_management/hami.mdx, docs/en/infrastructure_management/device_management/pgpu.mdx
Inserted YAML front matter (---, weight: <value>, ---) at top of each MDX to set navigation ordering.
New Kueue documentation
docs/en/infrastructure_management/device_management/kueue.mdx
Added detailed MDX doc covering Alauda Build of Kueue: introduction, prerequisites, install/upgrade steps, RBAC examples, ClusterQueue/ResourceFlavor/LocalQueue/default LocalQueue YAMLs, quota/cohorts/weights, VisibilityOnDemand and pending-workload monitoring, API Priority & Fairness, user permissions, and gang scheduling guidance.

Sequence Diagram(s)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Add hami and pgpu sites #23: Modifies overlapping device-management documentation files (same docs area: hami.mdx and pgpu.mdx), likely related content/front-matter edits.

Poem

🐇 I nibbled at headings, hopped through the queue,

front matter tucked in, and a manual anew,
clusters and flavors in tidy arrays,
I bounced through the docs on crisp, happy days,
carrots for readers — hop, read, and review!

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Add Alauda Build of Kueue' directly and clearly summarizes the main change: adding comprehensive documentation for Kueue, which is the primary modification across all three affected files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f82a391 and 2a776d7.

📒 Files selected for processing (3)
  • docs/en/infrastructure_management/device_management/hami.mdx
  • docs/en/infrastructure_management/device_management/kueue.mdx
  • docs/en/infrastructure_management/device_management/pgpu.mdx
🧰 Additional context used
🪛 LanguageTool
docs/en/infrastructure_management/device_management/kueue.mdx

[grammar] ~7-~7: Ensure spelling is correct
Context: ... ## 1. Introduction to Alauda Build of Kueue Alauda Build of Kueue is a kubernetes-na...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~10-~10: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...(as in active pods should be deleted). Alauda Build of Kueue does not replace any exi...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~12-~12: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...er, and cluster autoscaler components. Alauda Build of Kueue supports all-or-nothing ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[grammar] ~46-~46: Ensure spelling is correct
Context: ...to the Administrator -> Clusters -> Tartget Cluster -> Functional Components pag...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~46-~46: Ensure spelling is correct
Context: ...tional Componentspage, then click theUpdradebutton, and you will see theAlauda Bu...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~122-~122: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...eue. 2. Configure a resource flavor. 3. Configure a local queue. 4. Users can then submit...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[grammar] ~128-~128: Ensure spelling is correct
Context: ...overns a pool of resources such as GPU, CPU, memory, and pods. Cluster queues can b...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~345-~345: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...calQueue resource, a RoleBinding schema needs to be created referencing the ClusterRole ...

(REP_NEED_TO_VB)


[style] ~612-~612: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...eues and determine which cluster queues are able to share borrowable resources with each ot...

(BE_ABLE_TO)


[style] ~667-~667: ‘at a disadvantage’ might be wordy. Consider a shorter alternative.
Context: ... means that the cluster queue is always at a disadvantage compared to others, and its workloads a...

(EN_WORDINESS_PREMIUM_AT_A_DISADVANTAGE)


[style] ~673-~673: Consider removing “of” to be more concise
Context: ...antee the capacity to start and execute all of the related jobs in the gang together. Gan...

(ALL_OF_THE)


[style] ~675-~675: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ilization and can reduce running costs. Gang scheduling can also help to prevent iss...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pipelines as Code CI / doc-pr-build-ai
🔇 Additional comments (5)
docs/en/infrastructure_management/device_management/hami.mdx (1)

1-3: LGTM! Front matter added for document ordering.

The YAML front matter with weight: 10 enables proper ordering of device management documentation alongside the other files in this PR (pgpu.mdx with weight: 20, kueue.mdx with weight: 30).

docs/en/infrastructure_management/device_management/pgpu.mdx (1)

1-3: LGTM! Front matter added for document ordering.

The YAML front matter with weight: 20 provides consistent ordering structure across the device management documentation files.

docs/en/infrastructure_management/device_management/kueue.mdx (3)

1-13: LGTM! Clear introduction to Kueue.

The front matter and introduction section provide good context about Kueue's purpose and integration with Kubernetes. The content accurately describes the quota management and all-or-nothing scheduling semantics.


321-609: LGTM! Comprehensive monitoring documentation.

The monitoring section provides clear guidance on using the VisibilityOnDemand feature, including API endpoints, RBAC permissions, and practical examples with expected JSON responses. The testing procedures are well-documented.


610-679: LGTM! Well-documented advanced features.

The sections on cohorts, fair sharing, and gang scheduling provide clear explanations of these advanced Kueue features. The examples effectively demonstrate configuration patterns, and the explanations of resource borrowing and preemption strategies are accurate.


1. Create the assets by running the following command:
```bash
cat <<EOF| oc create -f -
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent CLI command usage.

Line 361 uses oc create while the rest of the documentation consistently uses kubectl commands. Consider using kubectl create for consistency.

🔎 Proposed fix
-    cat <<EOF| oc create -f -
+    cat <<EOF| kubectl create -f -
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cat <<EOF| oc create -f -
cat <<EOF| kubectl create -f -
🤖 Prompt for AI Agents
In docs/en/infrastructure_management/device_management/kueue.mdx around line
361, the CLI command uses "oc create" which is inconsistent with the rest of the
doc that uses "kubectl"; change the command to "kubectl create" so the example
matches the documented CLI throughout, and scan nearby examples to ensure
consistent use of kubectl rather than oc.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Dec 29, 2025

Deploying alauda-ai with  Cloudflare Pages  Cloudflare Pages

Latest commit: 5213bf9
Status: ✅  Deploy successful!
Preview URL: https://bd54c084.alauda-ai.pages.dev
Branch Preview URL: https://feat-use-kueue.alauda-ai.pages.dev

View logs

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a90b5ef and 5213bf9.

📒 Files selected for processing (1)
  • docs/en/infrastructure_management/device_management/kueue.mdx
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-31T02:30:16.360Z
Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.

Applied to files:

  • docs/en/infrastructure_management/device_management/kueue.mdx
🪛 LanguageTool
docs/en/infrastructure_management/device_management/kueue.mdx

[grammar] ~7-~7: Ensure spelling is correct
Context: ... ## 1. Introduction to Alauda Build of Kueue Alauda Build of Kueue is a kubernetes-na...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~10-~10: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...(as in active pods should be deleted). Alauda Build of Kueue does not replace any exi...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~12-~12: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...er, and cluster autoscaler components. Alauda Build of Kueue supports all-or-nothing ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~122-~122: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...eue. 2. Configure a resource flavor. 3. Configure a local queue. 4. Users can then submit...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[grammar] ~127-~127: Ensure spelling is correct
Context: ...overns a pool of resources such as GPU, CPU, memory, and pods. Cluster queues can b...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~343-~343: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...calQueue resource, a RoleBinding schema needs to be created referencing the ClusterRole ...

(REP_NEED_TO_VB)


[style] ~610-~610: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...eues and determine which cluster queues are able to share borrowable resources with each ot...

(BE_ABLE_TO)


[style] ~665-~665: ‘at a disadvantage’ might be wordy. Consider a shorter alternative.
Context: ... means that the cluster queue is always at a disadvantage compared to others, and its workloads a...

(EN_WORDINESS_PREMIUM_AT_A_DISADVANTAGE)


[style] ~671-~671: Consider removing “of” to be more concise
Context: ...antee the capacity to start and execute all of the related jobs in the gang together. Gan...

(ALL_OF_THE)


[style] ~673-~673: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ilization and can reduce running costs. Gang scheduling can also help to prevent iss...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pipelines as Code CI / doc-pr-build-ai


1. Create the assets by running the following command:
```bash
cat <<EOF| kubetl create -f -
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix critical typo in kubectl command.

Line 359 contains a typo: kubetl should be kubectl. This typo will cause the command to fail when users attempt to execute it.

🔎 Proposed fix
-    cat <<EOF| kubetl create -f -
+    cat <<EOF| kubectl create -f -
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cat <<EOF| kubetl create -f -
cat <<EOF| kubectl create -f -
🤖 Prompt for AI Agents
In docs/en/infrastructure_management/device_management/kueue.mdx around line
359, fix the typo in the kubectl command: change `kubetl` to `kubectl` so the
heredoc pipeline reads `cat <<EOF | kubectl create -f -`; update the line
accordingly to ensure the command executes correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants