SPLAT-2172: AWS dedicate host support#10079
Conversation
|
@vr4manta: This pull request references SPLAT-2172 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cc |
24500b6 to
342ab2d
Compare
|
/retest-required |
tthvo
left a comment
There was a problem hiding this comment.
Cool feature 👍 🚀
I have quite some comments 😅 Sorry if I review a little too soon...
| switch *p.HostPlacement.Affinity { | ||
| case aws.HostAffinityAnyAvailable: | ||
| if p.HostPlacement.DedicatedHost != nil { | ||
| allErrs = append(allErrs, field.Required(fldPath.Child("dedicatedHost"), "dedicatedHost is required when 'affinity' is set to DedicatedHost, and forbidden otherwise")) |
There was a problem hiding this comment.
Did you mean hostPlacement.dedicatedHost is forbidden if affinity is AnyAvailable?
allErrs = append(allErrs, field.Forbidden(fldPath.Child("dedicatedHost"), "dedicatedHost must not be set when affinity is AnyAvailable"))There was a problem hiding this comment.
@tthvo So this was an interesting one. In openshift/api and openshift/machine-api-operator, it was suggested to error this way for this scenario. I was doing this to keep it consistent.
- https://github.com/openshift/api/blob/0598ae66682433c49adc29b1ff14a201487f3afe/machine/v1beta1/types_awsprovider.go#L406
- openshift/machine-api-operator@f669914#diff-30c686f7a9dbd2aa917e5e410401d720a2a4bf4a11163495f5610c8f4b98edaeR917
Now that said, I am happy to make your suggestion if you prefer installer say it this way. Just let me know.
| if err != nil { | ||
| allErrs = append(allErrs, field.InternalError(placementPath.Child("dedicatedHost"), err)) | ||
| } else { | ||
| // Check the returned configured hosts to see if the dedicated hosts defined in install-config exists. |
There was a problem hiding this comment.
💡 ❓ I got 2 questions here:
-
Do the zones of dedicated hosts need to match machinepool zone field if defined?
installer/pkg/types/aws/machinepool.go
Lines 5 to 9 in 7c9b71a
-
Do the user-input zones for dedicated hosts need to match with the actual zones returned from AWS?
// If user specified a zone, validate it matches AWS if host.Zone != "" && host.Zone != hostDetails.Zone { allErrs = append(allErrs, field.Invalid( fldPath.Child("hostPlacement", "dedicatedHost").Index(i).Child("zone"), host.Zone, fmt.Sprintf("specified zone %s does not match actual host zone %s", host.Zone, hostDetails.Zone))) }
There was a problem hiding this comment.
Do the user-input zones for dedicated hosts need to match with the actual zones returned from AWS?
On that point, I'm curious if users should need to specify the zone for the dedicated host? It seems like we can just look up the dedicated host by id and then use that to determine the zone (which the pr is already doing IIUC). Seems like we might be able to shed zone from the install config.
|
/hold |
82915c2 to
8ef8679
Compare
|
/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-dedicated |
|
@vr4manta: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1c5c6d80-c717-11f0-93e9-d918c346504c-0 |
|
/unhold |
|
/retest |
|
/hold |
|
/retest |
|
@patrickdillon this PR is now ready for re-review. MAPA is now used for the creation of nodes on dedicated host. I also added a little logic to clean up dynamic dedicated hosts. Thanks! |
|
@vr4manta: This PR was included in a payload test run from openshift/machine-api-provider-aws#174
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/af501260-11b1-11f1-8355-45eb72a0dd0e-0 |
|
@vr4manta: This PR was included in a payload test run from openshift/machine-api-provider-aws#174
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8fbbd240-11c0-11f1-9976-41485f7c73c3-0 |
|
@vr4manta: This PR was included in a payload test run from openshift/machine-api-provider-aws#174
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/aee21670-1242-11f1-9543-785cee1a4298-0 |
| if err != nil { | ||
| allErrs = append(allErrs, field.InternalError(placementPath.Child("dedicatedHost"), err)) | ||
| } else { | ||
| // Check the returned configured hosts to see if the dedicated hosts defined in install-config exists. |
There was a problem hiding this comment.
Do the user-input zones for dedicated hosts need to match with the actual zones returned from AWS?
On that point, I'm curious if users should need to specify the zone for the dedicated host? It seems like we can just look up the dedicated host by id and then use that to determine the zone (which the pr is already doing IIUC). Seems like we might be able to shed zone from the install config.
c873bb3 to
5a397fb
Compare
|
/hold |
| // Dedicated hosts cannot be configured in defaultMachinePlatform | ||
| if platform.DefaultMachinePlatform.HostPlacement != nil { | ||
| defaultPath := fldPath.Child("defaultMachinePlatform").Child("hostPlacement") | ||
| errMsg := "dedicated hosts cannot be configured in defaultMachinePlatform, they must be specified per machine pool" |
There was a problem hiding this comment.
nit: it might be more user friendly to specify that they're only supported on compute machine pools, but NBD
| // Dedicated hosts are only supported on worker (compute) pools | ||
| if poolName != "" && poolName != types.MachinePoolComputeRoleName { | ||
| placementPath := fldPath.Child("hostPlacement") | ||
| errMsg := fmt.Sprintf("dedicated hosts are only supported on %s pools, not on %s pools", types.MachinePoolComputeRoleName, poolName) | ||
| allErrs = append(allErrs, field.Invalid(placementPath, pool.HostPlacement, errMsg)) | ||
| return allErrs | ||
| } | ||
|
|
||
| // Control plane pools cannot use dedicated hosts | ||
| if poolName == "" { | ||
| placementPath := fldPath.Child("hostPlacement") | ||
| errMsg := "dedicated hosts are not supported on control plane pools" | ||
| allErrs = append(allErrs, field.Invalid(placementPath, pool.HostPlacement, errMsg)) | ||
| return allErrs | ||
| } |
There was a problem hiding this comment.
nit: I would have expected these validations in pkg/types (because they don't require aws sdk)
| allErrs = append(allErrs, field.Invalid(dhPath, host, errMsg)) | ||
| } | ||
|
|
||
| // Check for multiple hosts in the same zone |
There was a problem hiding this comment.
Shouldn't this use case be ok? For general compute nodes, we support provisioning more than one compute node per zone (e.g. replica count > zone count)
| describeOutput, err := client.DescribeHosts(ctx, &ec2v2.DescribeHostsInput{ | ||
| HostIds: []string{id}, | ||
| }) | ||
| if err != nil { | ||
| errCode := HandleErrorCode(err) | ||
| if errCode == "InvalidHostID.NotFound" { | ||
| // Host doesn't exist, nothing to do | ||
| return nil |
There was a problem hiding this comment.
This call to DescribeHosts is always made but permissions.go is listing the permission as optional. We need to either make the permission required, or gracefully handle the permission error (i think handling the permission error is the way to go).
| } | ||
|
|
||
| // Release the host | ||
| _, err = client.ReleaseHosts(ctx, &ec2v2.ReleaseHostsInput{ |
There was a problem hiding this comment.
This probably also needs to be included in the permissions list?
|
@vr4manta: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
SPLAT-2172
Changes
Dependencies
Notes
MAO and CAO changes are needed for it to fully work. For now, this PR is adding the ability to generate the needed outputs for bootstrapping.