Skip to content

Implement validations to prevent changing TrainingRuntime #2599

@tenzen-y

Description

@tenzen-y

What you would like to be added?

I would like to implement TrainingRuntime webhook validations so that we can prevent any TrainingRuntime changes.

We should implement immutable validations for the following TrainingRuntime and ClusterTrainingRuntimefields:

Note that the controller-tools do not embed the top level metadata fields (.metadata) in CRD. So, for labels and annotations validations, we need to implement those by webhooks as opposed to CRD markers.

Why is this needed?

As we designed Trainer v2 in https://github.com/kubeflow/trainer/tree/master/docs/proposals/2170-kubeflow-trainer-v2#the-training-runtime-api

The TrainingRuntime is immutable, and so to make a change, a new version of the TrainingRuntime must be created and then the user must change the TrainJob to point to the new version. This provides control as to how changes to runtimes propagate to existing training jobs. For example, when training is running for a long time (e.g. 1-2 months).

In the first iteration, we do not have any kind of version control mechanism. So, to avoid TrainJob disruption by changing TrainingRuntime, we should handle TrainingRuntime and ClusterTrainingRuntime as immutable objects.

Love this feature?

Give it a 👍 We prioritize the features with most 👍

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions