Skip to content

SPLAT-2172: AWS dedicate host support#10079

Open
vr4manta wants to merge 5 commits intoopenshift:mainfrom
vr4manta:SPLAT-2172
Open

SPLAT-2172: AWS dedicate host support#10079
vr4manta wants to merge 5 commits intoopenshift:mainfrom
vr4manta:SPLAT-2172

Conversation

@vr4manta
Copy link
Contributor

@vr4manta vr4manta commented Nov 13, 2025

SPLAT-2172

Changes

  • Bumped openshift/api
  • Added logic to allow HostPlacement of dedicated hosts

Dependencies

Notes

MAO and CAO changes are needed for it to fully work. For now, this PR is adding the ability to generate the needed outputs for bootstrapping.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 13, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 13, 2025

@vr4manta: This pull request references SPLAT-2172 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

SPLAT-2172

Changes

  • Bumped openshift/api
  • Added logic to allow HostPlacement of dedicated hosts

Dependencies

Notes

MAO and CAO changes are needed for it to fully work. For now, this PR is adding the ability to generate the needed outputs for bootstrapping.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tthvo
Copy link
Member

tthvo commented Nov 14, 2025

/cc

@openshift-ci openshift-ci bot requested a review from tthvo November 14, 2025 00:35
@vr4manta vr4manta force-pushed the SPLAT-2172 branch 3 times, most recently from 24500b6 to 342ab2d Compare November 17, 2025 17:44
@tthvo
Copy link
Member

tthvo commented Nov 18, 2025

/retest-required

Copy link
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool feature 👍 🚀

I have quite some comments 😅 Sorry if I review a little too soon...

switch *p.HostPlacement.Affinity {
case aws.HostAffinityAnyAvailable:
if p.HostPlacement.DedicatedHost != nil {
allErrs = append(allErrs, field.Required(fldPath.Child("dedicatedHost"), "dedicatedHost is required when 'affinity' is set to DedicatedHost, and forbidden otherwise"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean hostPlacement.dedicatedHost is forbidden if affinity is AnyAvailable?

 allErrs = append(allErrs, field.Forbidden(fldPath.Child("dedicatedHost"), "dedicatedHost must not be set when affinity is AnyAvailable"))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tthvo So this was an interesting one. In openshift/api and openshift/machine-api-operator, it was suggested to error this way for this scenario. I was doing this to keep it consistent.

Now that said, I am happy to make your suggestion if you prefer installer say it this way. Just let me know.

if err != nil {
allErrs = append(allErrs, field.InternalError(placementPath.Child("dedicatedHost"), err))
} else {
// Check the returned configured hosts to see if the dedicated hosts defined in install-config exists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 ❓ I got 2 questions here:

  1. Do the zones of dedicated hosts need to match machinepool zone field if defined?

    type MachinePool struct {
    // Zones is list of availability zones that can be used.
    //
    // +optional
    Zones []string `json:"zones,omitempty"`

  2. Do the user-input zones for dedicated hosts need to match with the actual zones returned from AWS?

    // If user specified a zone, validate it matches AWS
    if host.Zone != "" && host.Zone != hostDetails.Zone {
      allErrs = append(allErrs, field.Invalid(
          fldPath.Child("hostPlacement", "dedicatedHost").Index(i).Child("zone"),
          host.Zone,
          fmt.Sprintf("specified zone %s does not match actual host zone %s",
              host.Zone, hostDetails.Zone)))
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the user-input zones for dedicated hosts need to match with the actual zones returned from AWS?

On that point, I'm curious if users should need to specify the zone for the dedicated host? It seems like we can just look up the dedicated host by id and then use that to determine the zone (which the pr is already doing IIUC). Seems like we might be able to shed zone from the install config.

@vr4manta
Copy link
Contributor Author

/hold
Need to rebase and add validation logic to make sure defined dedicated hosts are in zones based on the machinepool config.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2025
@vr4manta vr4manta force-pushed the SPLAT-2172 branch 5 times, most recently from 82915c2 to 8ef8679 Compare November 21, 2025 13:54
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 21, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 21, 2025
@vr4manta
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-dedicated

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 21, 2025

@vr4manta: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-dedicated

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1c5c6d80-c717-11f0-93e9-d918c346504c-0

@vr4manta
Copy link
Contributor Author

/unhold
/retest

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 24, 2025
@vr4manta
Copy link
Contributor Author

/retest

@vr4manta
Copy link
Contributor Author

vr4manta commented Dec 2, 2025

/hold
based on discussions with @patrickdillon and @JoelSpeed , I am adding MAPA support and then will enhance this PR to leverage that code path without being dependent on CAPI.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 2, 2025
@vr4manta
Copy link
Contributor Author

/retest

@vr4manta
Copy link
Contributor Author

@patrickdillon this PR is now ready for re-review. MAPA is now used for the creation of nodes on dedicated host. I also added a little logic to clean up dynamic dedicated hosts. Thanks!

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 24, 2026

@vr4manta: This PR was included in a payload test run from openshift/machine-api-provider-aws#174
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/af501260-11b1-11f1-8355-45eb72a0dd0e-0

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 24, 2026

@vr4manta: This PR was included in a payload test run from openshift/machine-api-provider-aws#174
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8fbbd240-11c0-11f1-9976-41485f7c73c3-0

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 25, 2026

@vr4manta: This PR was included in a payload test run from openshift/machine-api-provider-aws#174
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-dedicated

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/aee21670-1242-11f1-9543-785cee1a4298-0

if err != nil {
allErrs = append(allErrs, field.InternalError(placementPath.Child("dedicatedHost"), err))
} else {
// Check the returned configured hosts to see if the dedicated hosts defined in install-config exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the user-input zones for dedicated hosts need to match with the actual zones returned from AWS?

On that point, I'm curious if users should need to specify the zone for the dedicated host? It seems like we can just look up the dedicated host by id and then use that to determine the zone (which the pr is already doing IIUC). Seems like we might be able to shed zone from the install config.

@vr4manta vr4manta force-pushed the SPLAT-2172 branch 3 times, most recently from c873bb3 to 5a397fb Compare February 27, 2026 15:50
@vr4manta
Copy link
Contributor Author

/hold
After making changes based on reviews, getting error on checking dedicated hosts against defined zones

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 27, 2026
// Dedicated hosts cannot be configured in defaultMachinePlatform
if platform.DefaultMachinePlatform.HostPlacement != nil {
defaultPath := fldPath.Child("defaultMachinePlatform").Child("hostPlacement")
errMsg := "dedicated hosts cannot be configured in defaultMachinePlatform, they must be specified per machine pool"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might be more user friendly to specify that they're only supported on compute machine pools, but NBD

Comment on lines +512 to +526
// Dedicated hosts are only supported on worker (compute) pools
if poolName != "" && poolName != types.MachinePoolComputeRoleName {
placementPath := fldPath.Child("hostPlacement")
errMsg := fmt.Sprintf("dedicated hosts are only supported on %s pools, not on %s pools", types.MachinePoolComputeRoleName, poolName)
allErrs = append(allErrs, field.Invalid(placementPath, pool.HostPlacement, errMsg))
return allErrs
}

// Control plane pools cannot use dedicated hosts
if poolName == "" {
placementPath := fldPath.Child("hostPlacement")
errMsg := "dedicated hosts are not supported on control plane pools"
allErrs = append(allErrs, field.Invalid(placementPath, pool.HostPlacement, errMsg))
return allErrs
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would have expected these validations in pkg/types (because they don't require aws sdk)

allErrs = append(allErrs, field.Invalid(dhPath, host, errMsg))
}

// Check for multiple hosts in the same zone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use case be ok? For general compute nodes, we support provisioning more than one compute node per zone (e.g. replica count > zone count)

Comment on lines +970 to +977
describeOutput, err := client.DescribeHosts(ctx, &ec2v2.DescribeHostsInput{
HostIds: []string{id},
})
if err != nil {
errCode := HandleErrorCode(err)
if errCode == "InvalidHostID.NotFound" {
// Host doesn't exist, nothing to do
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call to DescribeHosts is always made but permissions.go is listing the permission as optional. We need to either make the permission required, or gracefully handle the permission error (i think handling the permission error is the way to go).

}

// Release the host
_, err = client.ReleaseHosts(ctx, &ec2v2.ReleaseHostsInput{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably also needs to be included in the permissions list?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 2, 2026

@vr4manta: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn-xpn 9364945 link false /test e2e-gcp-ovn-xpn
ci/prow/e2e-metal-ovn-two-node-fencing 9364945 link false /test e2e-metal-ovn-two-node-fencing
ci/prow/gcp-private 9364945 link false /test gcp-private
ci/prow/e2e-gcp-secureboot 9364945 link false /test e2e-gcp-secureboot
ci/prow/gcp-custom-endpoints-proxy-wif 9364945 link false /test gcp-custom-endpoints-proxy-wif
ci/prow/e2e-metal-assisted 9364945 link false /test e2e-metal-assisted
ci/prow/e2e-metal-ipi-ovn-virtualmedia 9364945 link false /test e2e-metal-ipi-ovn-virtualmedia
ci/prow/e2e-metal-ipi-ovn 9364945 link false /test e2e-metal-ipi-ovn
ci/prow/e2e-gcp-ovn 9364945 link true /test e2e-gcp-ovn
ci/prow/e2e-metal-single-node-live-iso 9364945 link false /test e2e-metal-single-node-live-iso
ci/prow/e2e-gcp-custom-dns 9364945 link false /test e2e-gcp-custom-dns
ci/prow/e2e-azurestack 9364945 link false /test e2e-azurestack
ci/prow/e2e-gcp-default-config 9364945 link false /test e2e-gcp-default-config
ci/prow/e2e-gcp-xpn-dedicated-dns-project 9364945 link false /test e2e-gcp-xpn-dedicated-dns-project
ci/prow/e2e-openstack-ovn 9364945 link true /test e2e-openstack-ovn
ci/prow/e2e-gcp-custom-endpoints 9364945 link false /test e2e-gcp-custom-endpoints
ci/prow/e2e-aws-ovn-dualstack-ipv6-primary-techpreview 37b76cb link false /test e2e-aws-ovn-dualstack-ipv6-primary-techpreview
ci/prow/e2e-aws-ovn-dualstack-ipv4-primary-techpreview 37b76cb link false /test e2e-aws-ovn-dualstack-ipv4-primary-techpreview
ci/prow/e2e-aws-ovn-heterogeneous 4b1d68a link false /test e2e-aws-ovn-heterogeneous
ci/prow/e2e-aws-ovn-imdsv2 4b1d68a link false /test e2e-aws-ovn-imdsv2
ci/prow/e2e-aws-ovn-shared-vpc-custom-security-groups 4b1d68a link false /test e2e-aws-ovn-shared-vpc-custom-security-groups
ci/prow/okd-scos-images 4b1d68a link true /test okd-scos-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants