-
Notifications
You must be signed in to change notification settings - Fork 836
Description
What you would like to be added?
I would like to implement TrainingRuntime webhook validations so that we can prevent any TrainingRuntime changes.
We should implement immutable validations for the following TrainingRuntime and ClusterTrainingRuntimefields:
.metadata.labels: validating webhooks for TrainingRuntime and ClusterTrainingRuntime.metadata.annotations: validating webhooks TrainingRuntime for ClusterTrainingRuntime.spec: CRD marker
Note that the controller-tools do not embed the top level metadata fields (.metadata) in CRD. So, for labels and annotations validations, we need to implement those by webhooks as opposed to CRD markers.
Why is this needed?
As we designed Trainer v2 in https://github.com/kubeflow/trainer/tree/master/docs/proposals/2170-kubeflow-trainer-v2#the-training-runtime-api
The TrainingRuntime is immutable, and so to make a change, a new version of the TrainingRuntime must be created and then the user must change the TrainJob to point to the new version. This provides control as to how changes to runtimes propagate to existing training jobs. For example, when training is running for a long time (e.g. 1-2 months).
In the first iteration, we do not have any kind of version control mechanism. So, to avoid TrainJob disruption by changing TrainingRuntime, we should handle TrainingRuntime and ClusterTrainingRuntime as immutable objects.
Love this feature?
Give it a 👍 We prioritize the features with most 👍