Skip to content

Commit 7b7dae4

Browse files
committed
doc(cloneset): introduce spec progressingDeadlineSeconds
Signed-off-by: michaelrren <[email protected]>
1 parent 12095e9 commit 7b7dae4

File tree

2 files changed

+195
-1
lines changed

2 files changed

+195
-1
lines changed

docs/user-manuals/cloneset.md

Lines changed: 96 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -529,6 +529,102 @@ spec:
529529
paused: true
530530
```
531531

532+
### Progress Deadline Seconds
533+
534+
**FEATURE STATE:** Kruise v1.9.0
535+
536+
The `.spec.progressDeadlineSeconds` field is an optional field that defines the maximum time (in seconds) the CloneSet controller waits before determining that a rollout has failed to make progress. When this deadline is exceeded without progress, the CloneSet controller records the following condition in the resource status:
537+
```yaml
538+
type: Progressing
539+
status: False
540+
reason: ProgressDeadlineExceeded
541+
```
542+
543+
By default, the CloneSet controller will continuously retry rollout operations for 600 seconds. Higher level orchestrators can take advantage of it and act accordingly, e.g.rollback the CloneSet (even when this Failed status is marked, it does not affect the underlying CloneSet controller's continued rolling updates of Pods).
544+
545+
> **Note:**
546+
>
547+
> If specified, this field value must be greater than `.spec.minReadySeconds`.
548+
549+
Therefore, by configuring `.spec.progressDeadlineSeconds`, a CloneSet will traverse multiple states during its lifecycle:
550+
- Progressing: the rollout is ongoing.
551+
- Complete: the partition update is successful or the rollout is successful.
552+
- Failed: the rollout is timeout.
553+
554+
#### Progressing CloneSet
555+
A CloneSet is marked as Progressing when performing any of the following operations:
556+
557+
- Rolling out a new revision.
558+
- Scaling up the newest revision during upgrade.
559+
- Scaling down older revisions during upgrade.
560+
- New Pods are ready or available (satisfying MinReadySeconds condition).
561+
562+
When the rollout enters the "Progressing" state, the CloneSet controller adds the following condition to the CloneSet's `.status.conditions`:
563+
```yaml
564+
type: Progressing
565+
status: "True"
566+
reason: CloneSetUpdated
567+
```
568+
569+
#### Complete CloneSet
570+
The Complete state is divided into two substates:
571+
572+
**Partition Paused:**
573+
574+
A CloneSet enters the partition paused state when:
575+
- All replicas associated with the CloneSet partition have been updated to the specified latest revision.
576+
- All replicas associated with the CloneSet partition are available.
577+
578+
The CloneSet controller adds the following condition to the CloneSet's `.status.conditions`:
579+
```yaml
580+
type: Progressing
581+
status: "True"
582+
reason: ProgressPartitionAvailable
583+
```
584+
585+
**Available:**
586+
587+
A CloneSet is marked as available when:
588+
589+
- All replicas have been updated to the latest specified revision.
590+
- All replicas are available.
591+
- No old revision replicas are running.
592+
593+
The CloneSet controller adds the following condition to the CloneSet's `.status.conditions`:
594+
```yaml
595+
type: Progressing
596+
status: "True"
597+
reason: CloneSetAvailable
598+
```
599+
600+
The Progressing condition maintains a status value of "True" until a new revision is initiated. This condition persists even when replica availability changes (which affects the Available condition instead).
601+
602+
#### Failed CloneSet
603+
A CloneSet enters the Failed state when it cannot successfully deploy the latest revision. Common causes include:
604+
605+
- Insufficient quota
606+
- Readiness probe failures
607+
- Image pull errors
608+
- Insufficient permissions
609+
- Limit ranges
610+
- Application runtime misconfiguration
611+
612+
This condition can be detected by configuring the `.spec.progressDeadlineSeconds` parameter. Once the deadline is exceeded, the CloneSet controller adds the following condition to the CloneSet's `.status.conditions`:
613+
```yaml
614+
type: Progressing
615+
status: "False"
616+
reason: ProgressDeadlineExceeded
617+
```
618+
619+
> **Note:**
620+
>
621+
> When a CloneSet rollout is paused, the controller stops progress checking against the specified deadline. Users can safely pause and resume a CloneSet rollout in the middle of the rollout without triggering the deadline exceeded condition.
622+
623+
#### Operations on Failed CloneSet
624+
All operations applicable to a Complete CloneSet can also be applied to a Failed CloneSet, including:
625+
- Rolling back to a previous revision.
626+
- Pausing the rollout to make multiple adjustments to the Pod template.
627+
532628
### In-Place Update Support for Modifying Resources
533629

534630
**FEATURE STATE:** Kruise v1.8.0
@@ -762,4 +858,3 @@ Currently, both status and metadata changes of Pods will trigger the reconcile o
762858

763859
However, for larger clusters or scenarios with frequent Pod update events, these unnecessary reconciles will block the real CloneSet reconciles, resulting in delayed rolling updates and other changes.
764860
To solve this problem, you can enable the **feature-gate CloneSetEventHandlerOptimization** to reduce some unnecessary reconcile enqueues.
765-

i18n/zh/docusaurus-plugin-content-docs/current/user-manuals/cloneset.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -506,6 +506,105 @@ spec:
506506
paused: true
507507
```
508508

509+
### 进度期限机制
510+
511+
**FEATURE STATE:** Kruise v1.9.0
512+
513+
`.spec.progressDeadlineSeconds` 是一个可选配置项,用于定义 CloneSet 控制器在判定升级部署失败前的最大等待时间(秒)。当超过此期限仍未取得进展时,CloneSet 控制器将在资源状态中记录相应的状况条目:
514+
```yaml
515+
type: Progressing
516+
status: False
517+
reason: ProgressDeadlineExceeded
518+
```
519+
520+
默认情况下,CloneSet 控制器会在 600 秒内持续重试部署操作。上层编排系统可利用此状态来触发对应的动作,例如进行 CloneSet 的回滚操作(即使此状态判定为超时,也不会影响底层 CloneSet 控制器继续对 Pod 进行滚动升级)。
521+
522+
> **注意:**
523+
>
524+
> 如果指定,则此字段值需要大于 `.spec.minReadySeconds` 取值。
525+
526+
因此,通过配置 `.spec.progressDeadlineSeconds`,会使得 CloneSet 在其生命周期中会经历多种状态:
527+
- Progressing(进行中):部署过程正在进行。
528+
- Complete(完成):分组部署完成或者整体部署成功。
529+
- Failed(失败):部署超时以至于无法继续进行。
530+
531+
#### 进行中的 CloneSet
532+
当执行以下任一操作时,CloneSet 将被标记为 Progressing 状态:
533+
- 执行滚动升级操作。
534+
- 升级过程中为最新版本 Revision 进行扩容。
535+
- 升级过程中为旧版本 Revision 进行缩容。
536+
- 新创建的 Pod 已就绪或可用(满足 MinReadySeconds 条件)。
537+
538+
此时,CloneSet控制器会在 `.status.conditions` 中添加以下状况条目:
539+
540+
```yaml
541+
type: Progressing
542+
status: "True"
543+
reason: CloneSetUpdated
544+
```
545+
546+
#### 完成的 CloneSet
547+
Complete 状态分为两种子状态:
548+
549+
**分组暂停状态:**
550+
551+
当满足以下条件时,CloneSet 进入分组暂停状态:
552+
553+
- 指定 partition 比例的副本已更新至最新版本。
554+
- 指定 partition 比例的副本均处于可用状态。
555+
556+
CloneSet 控制器会向 CloneSet 的 `.status.conditions` 中添加包含下面属性的状况条目:
557+
558+
```yaml
559+
type: Progressing
560+
status: "True"
561+
reason: ProgressPartitionAvailable
562+
```
563+
564+
**可用状态:**
565+
当以下条件发生时,Kruise 会将 CloneSet 变为可用状态:
566+
567+
- 所有副本均已更新至最新版本。
568+
- 所有副本均处于可用状态。
569+
- 无旧版本副本运行。
570+
571+
CloneSet 控制器会向 CloneSet 的 `.status.conditions` 中添加包含下面属性的状况条目:
572+
573+
```yaml
574+
type: Progressing
575+
status: "True"
576+
reason: CloneSetAvailable
577+
```
578+
579+
Progressing 的状况将会持续保持 "True",直到触发新的升级部署操作。即使副本可用性发生变化,此状况值也不会改变。
580+
581+
#### 失败的 CloneSet
582+
当 CloneSet 无法成功部署最新 Revision 时,将进入 Failed 状态。常见原因包括:
583+
584+
- 资源配额不足
585+
- 就绪探针失败
586+
- 镜像拉取失败
587+
- 权限不足
588+
- LimitRanges 配置问题
589+
- 应用运行时配置错误
590+
591+
通过配置 `.spec.progressDeadlineSeconds` 参数可检测此状况。超过截止时间后,CloneSet 控制器将向 `.status.conditions` 添加以下状况条目:
592+
593+
```yaml
594+
type: Progressing
595+
status: "False"
596+
reason: ProgressDeadlineExceeded
597+
```
598+
599+
> **说明:**
600+
>
601+
> 当用户暂停 CloneSet 部署时,控制器将停止进度检查。用户可在部署过程中安全地暂停和恢复操作,不会触发超时判定。
602+
603+
#### 对失败 CloneSet 的操作
604+
对于处于 Failed 状态的 CloneSet,可执行与 Complete 状态相同的管理操作,包括:
605+
- 回滚到历史修订版本。
606+
- 暂停部署过程以进行 Pod 模板的多项调整。
607+
509608
### 原地升级支持修改资源
510609

511610
**FEATURE STATE:** Kruise v1.8.0

0 commit comments

Comments
 (0)