Skip to content

Commit e748941

Browse files
committed
doc(cloneset): introduce spec progressingDeadlineSeconds
Signed-off-by: michaelrren <[email protected]>
1 parent 12095e9 commit e748941

File tree

2 files changed

+198
-1
lines changed

2 files changed

+198
-1
lines changed

docs/user-manuals/cloneset.md

Lines changed: 97 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -529,6 +529,103 @@ spec:
529529
paused: true
530530
```
531531

532+
### Progress Deadline Seconds
533+
534+
**FEATURE STATE:** Kruise v1.9.0
535+
536+
The `.spec.progressDeadlineSeconds` field is an optional field that defines the maximum time (in seconds) the CloneSet controller waits before determining that a rollout has failed to make progress. When this deadline is exceeded without progress, the CloneSet controller records the following condition in the resource status:
537+
```yaml
538+
type: Progressing
539+
status: False
540+
reason: ProgressDeadlineExceeded
541+
```
542+
543+
By default, CloneSet does not set this value, so the CloneSet controller will not record the condition to `.status.conditions` while the rollout is ongoing.
544+
545+
Once this value is set, the CloneSet controller will continuously check the rollout status within the specified time. Higher-level orchestration systems can leverage this status to trigger corresponding actions, e.g.rollback the CloneSet (even when this status is marked as timeout, it does not affect the underlying CloneSet controller's continued rolling updates of Pods).
546+
> **Note:**
547+
>
548+
> If specified, this field value must be greater than `.spec.minReadySeconds`.
549+
550+
Therefore, by configuring `.spec.progressDeadlineSeconds`, a CloneSet will traverse multiple states during its lifecycle:
551+
- Progressing: the rollout is ongoing.
552+
- Complete: the partition update is successful or the rollout is successful.
553+
- Failed: the rollout is timeout.
554+
555+
#### Progressing CloneSet
556+
A CloneSet is marked as Progressing when performing any of the following operations:
557+
558+
- Rolling out a new revision.
559+
- Scaling up the newest revision during upgrade.
560+
- Scaling down older revisions during upgrade.
561+
- New Pods are ready or available (satisfying MinReadySeconds condition).
562+
563+
When the rollout enters the "Progressing" state, the CloneSet controller adds the following condition to the CloneSet's `.status.conditions`:
564+
```yaml
565+
type: Progressing
566+
status: "True"
567+
reason: CloneSetUpdated
568+
```
569+
570+
#### Complete CloneSet
571+
The Complete state is divided into two substates:
572+
573+
**Partition Paused:**
574+
575+
A CloneSet enters the partition paused state when:
576+
- All replicas associated with the CloneSet partition have been updated to the specified latest revision.
577+
- All replicas associated with the CloneSet partition are available.
578+
579+
The CloneSet controller adds the following condition to the CloneSet's `.status.conditions`:
580+
```yaml
581+
type: Progressing
582+
status: "True"
583+
reason: ProgressPartitionAvailable
584+
```
585+
586+
**Available:**
587+
588+
A CloneSet is marked as available when:
589+
590+
- All replicas have been updated to the latest specified revision.
591+
- All replicas are available.
592+
- No old revision replicas are running.
593+
594+
The CloneSet controller adds the following condition to the CloneSet's `.status.conditions`:
595+
```yaml
596+
type: Progressing
597+
status: "True"
598+
reason: CloneSetAvailable
599+
```
600+
601+
The Progressing condition maintains a status value of "True" until a new revision is initiated. This condition persists even when replica availability changes (which affects the Available condition instead).
602+
603+
#### Failed CloneSet
604+
A CloneSet enters the Failed state when it cannot successfully deploy the latest revision. Common causes include:
605+
606+
- Insufficient quota
607+
- Readiness probe failures
608+
- Image pull errors
609+
- Insufficient permissions
610+
- Limit ranges
611+
- Application runtime misconfiguration
612+
613+
This condition can be detected by configuring the `.spec.progressDeadlineSeconds` parameter. Once the deadline is exceeded, the CloneSet controller adds the following condition to the CloneSet's `.status.conditions`:
614+
```yaml
615+
type: Progressing
616+
status: "False"
617+
reason: ProgressDeadlineExceeded
618+
```
619+
620+
> **Note:**
621+
>
622+
> When a CloneSet rollout is paused, the controller stops progress checking against the specified deadline. Users can safely pause and resume a CloneSet rollout in the middle of the rollout without triggering the deadline exceeded condition.
623+
624+
#### Operations on Failed CloneSet
625+
All operations applicable to a Complete CloneSet can also be applied to a Failed CloneSet, including:
626+
- Rolling back to a previous revision.
627+
- Pausing the rollout to make multiple adjustments to the Pod template.
628+
532629
### In-Place Update Support for Modifying Resources
533630

534631
**FEATURE STATE:** Kruise v1.8.0
@@ -762,4 +859,3 @@ Currently, both status and metadata changes of Pods will trigger the reconcile o
762859

763860
However, for larger clusters or scenarios with frequent Pod update events, these unnecessary reconciles will block the real CloneSet reconciles, resulting in delayed rolling updates and other changes.
764861
To solve this problem, you can enable the **feature-gate CloneSetEventHandlerOptimization** to reduce some unnecessary reconcile enqueues.
765-

i18n/zh/docusaurus-plugin-content-docs/current/user-manuals/cloneset.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -506,6 +506,107 @@ spec:
506506
paused: true
507507
```
508508

509+
### 进度期限机制
510+
511+
**FEATURE STATE:** Kruise v1.9.0
512+
513+
`.spec.progressDeadlineSeconds` 是一个可选配置项,用于定义 CloneSet 控制器在判定升级部署失败前的最大等待时间(秒)。当超过此期限仍未取得进展时,CloneSet 控制器将在资源状态中记录相应的状况条目:
514+
```yaml
515+
type: Progressing
516+
status: False
517+
reason: ProgressDeadlineExceeded
518+
```
519+
520+
CloneSet 默认不会设置该值,因此在默认情况下 CloneSet 控制器不会在 `.status.conditions` 上记录相应的状况条目。
521+
522+
一旦设置该值,CloneSet 控制器会在设定的时间内持续检查部署操作。上层编排系统可利用此状态来触发对应的动作,例如进行 CloneSet 的回滚操作(即使此状态判定为超时,也不会影响底层 CloneSet 控制器继续对 Pod 进行滚动升级)。
523+
524+
> **注意:**
525+
>
526+
> 如果指定,则此字段值需要大于 `.spec.minReadySeconds` 取值。
527+
528+
因此,通过配置 `.spec.progressDeadlineSeconds`,会使得 CloneSet 在其生命周期中会经历多种状态:
529+
- Progressing(进行中):部署过程正在进行。
530+
- Complete(完成):分组部署完成或者整体部署成功。
531+
- Failed(失败):部署超时以至于无法继续进行。
532+
533+
#### 进行中的 CloneSet
534+
当执行以下任一操作时,CloneSet 将被标记为 Progressing 状态:
535+
- 执行滚动升级操作。
536+
- 升级过程中为最新版本 Revision 进行扩容。
537+
- 升级过程中为旧版本 Revision 进行缩容。
538+
- 新创建的 Pod 已就绪或可用(满足 MinReadySeconds 条件)。
539+
540+
此时,CloneSet控制器会在 `.status.conditions` 中添加以下状况条目:
541+
542+
```yaml
543+
type: Progressing
544+
status: "True"
545+
reason: CloneSetUpdated
546+
```
547+
548+
#### 完成的 CloneSet
549+
Complete 状态分为两种子状态:
550+
551+
**分组暂停状态:**
552+
553+
当满足以下条件时,CloneSet 进入分组暂停状态:
554+
555+
- 指定 partition 比例的副本已更新至最新版本。
556+
- 指定 partition 比例的副本均处于可用状态。
557+
558+
CloneSet 控制器会向 CloneSet 的 `.status.conditions` 中添加包含下面属性的状况条目:
559+
560+
```yaml
561+
type: Progressing
562+
status: "True"
563+
reason: ProgressPartitionAvailable
564+
```
565+
566+
**可用状态:**
567+
当以下条件发生时,Kruise 会将 CloneSet 变为可用状态:
568+
569+
- 所有副本均已更新至最新版本。
570+
- 所有副本均处于可用状态。
571+
- 无旧版本副本运行。
572+
573+
CloneSet 控制器会向 CloneSet 的 `.status.conditions` 中添加包含下面属性的状况条目:
574+
575+
```yaml
576+
type: Progressing
577+
status: "True"
578+
reason: CloneSetAvailable
579+
```
580+
581+
Progressing 的状况将会持续保持 "True",直到触发新的升级部署操作。即使副本可用性发生变化,此状况值也不会改变。
582+
583+
#### 失败的 CloneSet
584+
当 CloneSet 无法成功部署最新 Revision 时,将进入 Failed 状态。常见原因包括:
585+
586+
- 资源配额不足
587+
- 就绪探针失败
588+
- 镜像拉取失败
589+
- 权限不足
590+
- LimitRanges 配置问题
591+
- 应用运行时配置错误
592+
593+
通过配置 `.spec.progressDeadlineSeconds` 参数可检测此状况。超过截止时间后,CloneSet 控制器将向 `.status.conditions` 添加以下状况条目:
594+
595+
```yaml
596+
type: Progressing
597+
status: "False"
598+
reason: ProgressDeadlineExceeded
599+
```
600+
601+
> **说明:**
602+
>
603+
> 当用户暂停 CloneSet 部署时,控制器将停止进度检查。用户可在部署过程中安全地暂停和恢复操作,不会触发超时判定。
604+
605+
#### 对失败 CloneSet 的操作
606+
对于处于 Failed 状态的 CloneSet,可执行与 Complete 状态相同的管理操作,包括:
607+
- 回滚到历史修订版本。
608+
- 暂停部署过程以进行 Pod 模板的多项调整。
609+
509610
### 原地升级支持修改资源
510611

511612
**FEATURE STATE:** Kruise v1.8.0

0 commit comments

Comments
 (0)