Skip to content

Conversation

a1994sc
Copy link
Contributor

@a1994sc a1994sc commented Jun 21, 2025

Description

This PR adds in the zarf variable ZARF_ARCHITECTURE, this helps with deploying the zarf package into a multi-arch cluster.

Related Issue

Relates to #3817

Checklist before merging

@a1994sc a1994sc requested review from a team as code owners June 21, 2025 21:07
Copy link

netlify bot commented Jun 21, 2025

Deploy Preview for zarf-docs canceled.

Name Link
🔨 Latest commit d155b10
🔍 Latest deploy log https://app.netlify.com/projects/zarf-docs/deploys/68c0a1a6219981000870dbb7

@a1994sc
Copy link
Contributor Author

a1994sc commented Jun 21, 2025

I need to add logic to the injector to use the arch of the package.

Added injector logic

Copy link

codecov bot commented Jul 1, 2025

Codecov Report

❌ Patch coverage is 69.23077% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pkg/packager/deploy.go 0.00% 4 Missing ⚠️
Files with missing lines Coverage Δ
src/internal/packager/template/template.go 61.29% <100.00%> (+49.33%) ⬆️
src/pkg/cluster/cluster.go 36.52% <100.00%> (+0.83%) ⬆️
src/pkg/cluster/injector.go 64.92% <100.00%> (+0.20%) ⬆️
src/pkg/packager/deploy.go 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@a1994sc a1994sc force-pushed the feature/arch branch 2 times, most recently from c715385 to 238956a Compare July 2, 2025 00:11
@a1994sc a1994sc marked this pull request as draft July 2, 2025 00:12
@a1994sc
Copy link
Contributor Author

a1994sc commented Jul 2, 2025

Hm, so I after reviewing the logs from the last e2e test I see that for the arch value was not being passed along correctly. I updated the logic such that the injector, docker registry, and gitea all get deployed.... however the agent gets deployed fine once, however during part of the make test-e2e-with-cluster ARCH=amd64 the agent pods get into a pending state because the kubernetes.io/arch is set to ""... any input would be super helpful

@a1994sc
Copy link
Contributor Author

a1994sc commented Jul 2, 2025

image

@a1994sc
Copy link
Contributor Author

a1994sc commented Jul 6, 2025

I updated this PR so now ZARF_ARCHITECTURE is only set for the life-cycle of the package deployment.

Edit:

Though, I am getting the following error when doing an end-2-end test:

4s          Warning   FailedCreate        replicaset/source-controller-869c75dd44             Error creating: Internal error occurred: failed calling webhook "agent-pod.zarf.dev": failed to call webhook: Post "https://agent-hook.zarf.svc:443/mutate/pod?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "ca.private.zarf.dev")

@a1994sc a1994sc marked this pull request as ready for review July 7, 2025 16:42
@AustinAbro321
Copy link
Contributor

This seems like it'll be a huge value add for Zarf, thanks for the PR. Once CI is passing, I'll try to test locally and send you an updated eks.yaml which would let us spin up a multi-arch cluster so we can have a proper e2e test.

@a1994sc
Copy link
Contributor Author

a1994sc commented Jul 11, 2025

Yea no problem! This does not fully address having a truly multi-arch support, but is a step in that direction.

(A bit selfishly I would love to see zarf running a multi-arch cluster at home, RPI and intel NUC)

@a1994sc
Copy link
Contributor Author

a1994sc commented Jul 14, 2025

Todo: update the variable to ZARF_PKG_ARCHITECTURE per convo on slack channel

ref: https://kubernetes.slack.com/archives/C03B6BJAUJ3/p1752503299318649?thread_ts=1752502999.838289&cid=C03B6BJAUJ3

Copy link
Contributor

@AustinAbro321 AustinAbro321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small notes, I will make it an action item on my side to get you a new eks.yaml that can properly test a multi-node cluster with Zarf

@AustinAbro321 AustinAbro321 mentioned this pull request Jul 17, 2025
2 tasks
@a1994sc a1994sc requested a review from AustinAbro321 July 17, 2025 23:54
Copy link
Contributor

@AustinAbro321 AustinAbro321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a1994sc a1994sc requested a review from AustinAbro321 July 23, 2025 11:35
@a1994sc
Copy link
Contributor Author

a1994sc commented Aug 7, 2025

@AustinAbro321 is there anything else I can do for this PR?

@AustinAbro321
Copy link
Contributor

@a1994sc Apology for the delay, I realized there was a gap in how our AWS infra was setup that required some changes to allow PRs to make EKS clusters. I have a workflow in .github/workflows/nightly-eks-multi-arch.yml that should run through and do a multi arch test. Could you copy that file, minus line 6 that makes it run on pull requests. Workflow is running now, it's passed before I just made some small changes to the structure of the file.

Comment on lines 266 to 268
if nodeDetails.Status.NodeInfo.Architecture == "" || nodeDetails.Status.NodeInfo.Architecture != architecture {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if nodeDetails.Status.NodeInfo.Architecture == "" || nodeDetails.Status.NodeInfo.Architecture != architecture {
continue
}
if nodeDetails.Status.NodeInfo.Architecture != "" && nodeDetails.Status.NodeInfo.Architecture != architecture {
continue
}

Suggestion to change the behavior to not continue if architecture is empty. That way on the off chance there's some distro or version that doesn't fill in this value, nothing breaks for existing users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to review, but I think that there was an issue with the injector logic not finding the right node with the suggested config.... but I will review and see if I had some bad logic before

@a1994sc a1994sc requested a review from AustinAbro321 August 15, 2025 20:13
@AustinAbro321
Copy link
Contributor

Confirmed locally the test-upgrade failure is not a flake. If we go from an older zarf init to the zarf init on this branch then it errors.

@a1994sc
Copy link
Contributor Author

a1994sc commented Sep 1, 2025

I got bogged down by org stuff, so I will see about identifying the upgrade issues this week

@a1994sc
Copy link
Contributor Author

a1994sc commented Sep 7, 2025

@AustinAbro321 would expanding state.Merge() and writing that new state to the zarf-state secret be the correct way to set the architecture?

zarf/src/pkg/state/state.go

Lines 269 to 277 in b337191

type MergeOptions struct {
GitServer GitServerInfo
RegistryInfo RegistryInfo
ArtifactServer ArtifactServerInfo
Services []string
}
// Merge merges init options for provided services into the provided state to create a new state struct
func Merge(oldState *State, opts MergeOptions) (*State, error) {

@AustinAbro321
Copy link
Contributor

AustinAbro321 commented Sep 8, 2025

@a1994sc Since state.Merge essentially exists for update-creds we shouldn't make an update there. In that case we can assume the architecture in state will be the same as before since a new package is not being deployed. Only time it'd ever change is if someone in a multi architecture cluster deploys one version of the init package with one architecture then deploys another version of the init packager with a different architecture.

The reason the upgrade test is failing currently is because architecture is only set when state is nil. We need to set architecture outside of the nil check so that is always set even on subsequent init calls. Otherwise we will see this in the podspec which will never match to a node.

  nodeSelector:
    kubernetes.io/arch: ""

@a1994sc
Copy link
Contributor Author

a1994sc commented Sep 8, 2025

Ah ok, well is there an existing way to write the architecture of the package to the state? And have it persist? I have tried:

		if d.s.Architecture == "" {
			d.s.Architecture = pkgLayout.Pkg.Build.Architecture
		}

But that does not persist in the zarf-state secret

edit: add "real" code example

@a1994sc
Copy link
Contributor Author

a1994sc commented Sep 18, 2025

@AustinAbro321 I believe that I fixed the issue with this PR. Can you, or another maintainer, run the tests to verify?

@AustinAbro321
Copy link
Contributor

I'm thinking about the complexity of this PR, and also about a recent issue we got #4168 which would somewhat conflict with this PR as written.

I would recommend taking a simpler approach here and adding a Zarf variable called NODE_SELECTOR to each of our init package charts. Zarf would no longer automatically set the node selector, the user would need to use the --set flag. I think the manual step for a small percentage of users is worth it to lower the complexity in Zarf. The only go changes with this approach would be with the injector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants