Skip to content

Conversation

@lloyd-brown
Copy link
Collaborator

@lloyd-brown lloyd-brown commented Nov 4, 2025

We saw an error last week #7761 where using rsync to set up the files required by a cluster In Kubernetes would fail with OCI runtime exec failed: exec failed: unable to start container process: exec: "rsync": executable file not found in $PATH: unknown showing that rsync was not available on the kubernetes pod. After some digging I found that we install necessary packages (such as rsync) in the background while setting up the cluster. What this means is that if our installation hangs for whatever reason our cluster will be marked as ready before the packages are ready. This leads to the error above where attempt to rsync to move files into the pod but the rsync installation has yet to complete.

Now we will try to exec into the pod and check if rsync exists if not we will delay the rsync 60 seconds to see if the package installation will catch up. This should help cases where the package installation is slightly slow.

I tested this by

  • Injecting a long sleep into the package installation code and verifying that we will wait the allotted time for rsync to come up
  • Making sure we don’t wait in the case rsync is immediately ready.

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@lloyd-brown
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@lloyd-brown lloyd-brown force-pushed the lloyd/fix-rsync-not-found branch from 86e64fa to e8ee5c7 Compare November 4, 2025 01:18
@lloyd-brown
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

1 similar comment
@lloyd-brown
Copy link
Collaborator Author

/smoke-test -k test_docker_storage_mounts --kubernetes

@lloyd-brown lloyd-brown marked this pull request as ready for review November 4, 2025 02:01
Copy link
Collaborator

@cg505 cg505 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the happy path (rsync already installed), this will introduce an additional roundtrip while uploading files - what is the overhead of that?
If significant, can we instead execute a bash script that does the waiting on the remote cluster and than execs `"$@" once rsync is installed?
Also, doesn't block the PR but we should understand why we are not already waiting for rsync to be installed using the signal files in /tmp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants