Skip to content

Conversation

@simondeziel
Copy link
Member

@simondeziel simondeziel commented Sep 16, 2025

Instead of waiting a fixed 60 seconds between retries, it now starts with shorter 5-second intervals and gradually increases the delay until reaching the 60-second cap.

Here's what it looks like in the test script:

+ ip link del foo
+ echo '==> Networks will appear ready after the next retry'
==> Networks will appear ready after the next retry
++ lxd waitready --network --storage --timeout 15
ERROR  [2025-09-16T14:42:52Z] Failed mounting storage pool                  err="Failed to create storage pool directory \"/tmp/lxd-test.tmp.yLzr/8QH/storage-pools/lxdtest-8QH-pool\": mkdir /tmp/lxd-test.tmp.yLzr/8QH/storage-pools/lxdtest-8QH-pool: not a directory" pool=lxdtest-8QH-pool
INFO   [2025-09-16T14:42:52Z] Initialized network                           name=lxdt9577 project=default
INFO   [2025-09-16T14:42:52Z] All networks initialized                     
ERROR  [2025-09-16T14:43:02Z] Failed mounting storage pool                  err="Failed to create storage pool directory \"/tmp/lxd-test.tmp.yLzr/8QH/storage-pools/lxdtest-8QH-pool\": mkdir /tmp/lxd-test.tmp.yLzr/8QH/storage-pools/lxdtest-8QH-pool: not a directory" pool=lxdtest-8QH-pool
+ '[' 'Error: Storage pools not ready yet after 15s timeout' = 'Error: Storage pools not ready yet after 15s timeout' ']'

The test's runtime goes from ~125s to ~35s.

@simondeziel simondeziel requested a review from Copilot September 16, 2025 15:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the startup behavior of LXD networks and storage pools by implementing a more aggressive retry strategy during initialization. Instead of waiting a fixed 60 seconds between retries, the system now starts with shorter 5-second intervals and gradually increases the delay until reaching the 60-second cap.

Key changes:

  • Implements exponential backoff for network and storage pool initialization retries
  • Reduces test timeouts from 80 seconds to 15 seconds to match the improved retry behavior
  • Significantly improves test runtime from ~125s to ~35s

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
lxd/storage.go Adds exponential backoff logic for storage pool initialization retries
lxd/networks.go Adds exponential backoff logic for network initialization retries
test/suites/waitready.sh Updates test timeouts and comments to reflect the new retry behavior

@simondeziel simondeziel force-pushed the faster-initial-waitready branch from 27c6e73 to b33d141 Compare September 16, 2025 15:21
Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you de-dupe that wait logic, some sort of backoff function?

@simondeziel simondeziel changed the title Try to bring up networks and storage pools more aggressively at first Try to bring up networks and storage pools more eagerly at first Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants