Add KKP Backup documentation#2106

Open

csengerszabo wants to merge 21 commits intomainfrom

Contributor

csengerszabo commented Mar 9, 2026 •

edited

Loading

This document outlines the backup and restore procedures for the Kubermatic Kubernetes Platform, emphasizing the importance of a comprehensive backup strategy, recovery objectives, and a multi-layered backup approach.

Fixes kubermatic/product-strategy#22


          Add KKP Backup documentation

22ca1de

This document outlines the backup and restore procedures for the Kubermatic Kubernetes Platform, emphasizing the importance of a comprehensive backup strategy, recovery objectives, and a multi-layered backup approach.

Signed-off-by: csengerszabo <csenger@kubermatic.com>

kubermatic-bot added dco-signoff: yes size/L labels

csengerszabo requested review from mfahlandt and scheeles

March 9, 2026 19:07

csengerszabo added 6 commits

March 10, 2026 11:08


          Clarify documentation for Integrated User Cluster Backup

d1175d1

Signed-off-by: csengerszabo <csenger@kubermatic.com>


          Fix links and add KubeOne backup references

3f9a3c9

Updated the link format for Integrated User Cluster Backup documentation and added references for KubeOne cluster backup and restore strategies.

Signed-off-by: csengerszabo <csenger@kubermatic.com>


          Fix formatting of KubeOne backup references

aae194d

Signed-off-by: csengerszabo <csenger@kubermatic.com>


          Replace backup image with a new tuned version

e7125dc

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>


          Clarify backup process for etcd and PKI

a7957bf

Updated language to indicate that the cronjob and tools can be used for backups, rather than stating they must be used.

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>


          Update backup image in KKP backup tutorial

56af4fd

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>

toschneck reviewed

View reviewed changes

content/kubermatic/main/tutorials-howtos/kkp-backup/_index.en.md Outdated

+              #### MinIO
+              * MinIO serves as a cluster-internal, central datastore for all Kubernetes system-related backups.
+              * All data stored within MinIO should be synchronized every 30 minutes to an external object storage solution (e.g., Azure Blob Storage, AWS S3) via a Kubernetes cronjob. This process utilizes the `rclone` command-line tool, which enables delta synchronization to S3-compatible datastores.

Member

toschneck Mar 16, 2026

https://github.com/kubermatic/community-components/tree/master/components/rclone-s3-syncer you can ref this as example implementation

Contributor Author

csengerszabo Mar 17, 2026

Added the link.

Member

toschneck Mar 17, 2026

the link should be at the rclone part not on the top

content/kubermatic/main/tutorials-howtos/kkp-backup/_index.en.md

+              * An etcd "ring" can tolerate the loss of up to (N-1)/2 nodes and remain healthy. However, if more nodes are lost, the database must be restored from a backup. A snapshot from a single member of the etcd ring is sufficient to restore the entire cluster.
+              * The Public Key Infrastructure (PKI) encompasses the Certificate Authority (CA), certificates, and keys required for Kubernetes authentication. Backing up the PKI is equally critical for a swift recovery.
+              * We recommend backing up etcd snapshots and the PKI every 30 minutes and storing these backups outside the cluster.
+              * A Kubernetes cronjob should handle this process: it runs every 30 minutes, collects the PKI data, captures an etcd snapshot, and can use the `restic` command-line tool to upload the data to the cluster-internal MinIO storage.

Member

toschneck Mar 16, 2026

https://docs.kubermatic.com/kubeone/main/examples/addons-backup/

Link to build-in backup solution

Contributor Author

csengerszabo Mar 17, 2026

I already added this link down on the page with other references.

content/kubermatic/main/tutorials-howtos/kkp-backup/_index.en.md Outdated

+              #### Kubernetes Objects
+              * While etcd and PKI backups are sufficient for restoring a broken cluster within the same environment, it is often necessary to restore a previous state within an otherwise functional cluster, or to migrate a previous state to an entirely new cluster.
+              * This is where Velero excels. Velero captures a snapshot of all objects within the cluster, enabling targeted state restoration (similar to executing `kubectl get <crd> <crd-name> -o yaml > my-object.yaml`).
+              * Velero is recommended to run, for example, every 6 hours.

Member

toschneck Mar 16, 2026

we deliver some default link https://github.com/kubermatic/kubermatic/tree/main/charts/backup/velero

pot. docu link

Contributor Author

csengerszabo Mar 17, 2026

Added the link there.

content/kubermatic/main/tutorials-howtos/kkp-backup/_index.en.md Outdated

+              * Therefore, only the Prometheus database requires backing up. To balance performance and usability, we recommend backing up the database every 6 hours.
+              * Velero, in conjunction with its `restic` integration, can be utilized for this task.
+              * Velero extracts a dump of the Prometheus database and securely syncs it to the cluster-internal MinIO datastore.
+              * This process can be seamlessly integrated into the standard Velero backup cycle.

Member

toschneck Mar 16, 2026

also link to https://github.com/kubermatic/kubermatic/tree/main/charts/backup/velero and how to enable it at MLA / some values needs to get set

Contributor Author

csengerszabo Mar 17, 2026

@toschneck can you elaborate, what exactly should be added here?

content/kubermatic/main/tutorials-howtos/kkp-backup/_index.en.md


		### User Clusters

		#### etcd

Member

toschneck Mar 16, 2026

Link https://docs.kubermatic.com/kubermatic/main/tutorials-howtos/etcd-backups/

Contributor Author

csengerszabo Mar 17, 2026

I added the link into the first line.

content/kubermatic/main/tutorials-howtos/kkp-backup/_index.en.md

+              * The Kubermatic Kubernetes Platform (KKP) provides a fully automated and integrated mechanism with Velero on user clusters to manage these backups, storing them on dedicated cloud storage.
+              * You can learn more about our Integrated User Cluster Backup feature here: [documentation of Integrated User Cluster Backup in KKP](cluster-backup/)
+              #### Data Replication

Member

toschneck Mar 16, 2026

link again to this example https://github.com/kubermatic/community-components/tree/master/components/rclone-s3-syncer you can ref this as example implementation

Contributor Author

csengerszabo Mar 17, 2026

I linked this once on the top of this section, no need to link it multiple times.

content/kubermatic/main/tutorials-howtos/kkp-backup/_index.en.md Outdated

+              | Backup Job | Schedule | TTL |
+              | :--- | :--- | :--- |
+              | KKP master control plane VM backups | Once daily | 3 days |

Member

toschneck Mar 16, 2026

if you havet etcd restic snapshot on kubeone this is not required

Contributor Author

csengerszabo Mar 17, 2026

Added the comment.

content/kubermatic/main/tutorials-howtos/kkp-backup/_index.en.md

+              | :--- | :--- | :--- |
+              | KKP master control plane VM backups | Once daily | 3 days |
+              | MLA data (KKP master cluster objects + Prometheus data) | Every 6 hours | 168 hours (7 days) |
+              | KKP master etcd and PKI | Every 30 minutes | 24 hours |

Member

toschneck Mar 16, 2026

per seeed this must be done as well

Contributor Author

csengerszabo Mar 17, 2026

Added a new line for that.

content/kubermatic/main/tutorials-howtos/kkp-backup/_index.en.md

+              ## Process
+              * To accelerate the recovery process and minimize human error, a comprehensive disaster recovery runbook must be documented.
+              * This runbook should provide explicit, step-by-step instructions detailing the appropriate recovery strategies for various failure scenarios.

Member

toschneck Mar 16, 2026

this is not a step-by-step guide, it's more an overview

content/kubermatic/main/tutorials-howtos/kkp-backup/_index.en.md

+              * These tests must be conducted at least annually and should be executed by various team members.
+              * This practice ensures that the documentation remains current and prevents knowledge silos within the team.
+              ## References for KubeOne cluster backup and restore

Member

toschneck Mar 16, 2026

Also add the KKP backup links + https://github.com/kubermatic/community-components/tree/master/components/rclone-s3-syncer you can ref this as example implementation

Contributor Author

csengerszabo Mar 17, 2026

KKP Backup is already linked. Plus I linked this once on the top of this section, no need to link it multiple times.

Member

toschneck commented Mar 16, 2026

Review

Image is not correct

csengerszabo and others added 7 commits

March 17, 2026 10:13


          Enhance backup and disaster recovery documentation

a73c557

Added example implementation link for backup strategy and clarified backup job descriptions.

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>


          move rclone link to its place

aec036b

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>


          Update backup image in KKP backup tutorial

c18c337

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>


          Add files via upload

03da007

Signed-off-by: Csenger Szabo <szabo.csenger@gmail.com>


          Change image source in kkp-backup tutorial

642c0b3

Updated image source for kkp_backup in backup tutorial.

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>


          Fix image filename

0f05613

Signed-off-by: Csenger Szabo <szabo.csenger@gmail.com>


          Update backup image in KKP backup tutorial

992eaf7

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>

Contributor Author

csengerszabo commented Mar 18, 2026

/retest

csengerszabo and others added 4 commits

March 18, 2026 11:44


          Delete content/kubermatic/main/tutorials-howtos/kkp-backup/kkp_backup…

1152e07

…_edited.png

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>


          Delete content/kubermatic/main/tutorials-howtos/kkp-backup/kkp_backup…

542632d

…_tuned_matched copy2.png

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>


          Update backup image in KKP backup tutorial

fbc0ec7

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>


          new image file with appropriate naming

33450ca

Signed-off-by: Csenger Szabo <szabo.csenger@gmail.com>

Member

mfahlandt commented Mar 19, 2026

/lgtm
/approve

kubermatic-bot assigned mfahlandt

kubermatic-bot added the lgtm label

Contributor

kubermatic-bot commented Mar 19, 2026

LGTM label has been added.

Details

Git tree hash: 7471e541d66565ce850c3b655a71d683d4eb7e96

Contributor

kubermatic-bot commented Mar 19, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mfahlandt
Once this PR has been reviewed and has the lgtm label, please assign dakraus for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


          Image change VM Backup (optional)

e3bcbbd

Signed-off-by: Csenger Szabo <szabo.csenger@gmail.com>

kubermatic-bot removed the lgtm label

Contributor

kubermatic-bot commented Mar 23, 2026

New changes are detected. LGTM label has been removed.

csengerszabo added 2 commits

March 23, 2026 15:19


          Update backup strategy image in documentation

35f096b

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>


          Delete content/kubermatic/main/tutorials-howtos/kkp-backup/kkpbackup.png

67f63d7

Signed-off-by: Csenger Szabo <csenger@kubermatic.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes size/L