Skip to content

retry and error parsing#9878

Open
gurudatta-patil wants to merge 2 commits intotemporalio:mainfrom
gurudatta-patil:fix/retry-and-error-parsing
Open

retry and error parsing#9878
gurudatta-patil wants to merge 2 commits intotemporalio:mainfrom
gurudatta-patil:fix/retry-and-error-parsing

Conversation

@gurudatta-patil
Copy link
Copy Markdown

What changed?

When the SQL persistence schema version check runs at startup (verifyPersistenceCompatibleVersion), it now retries automatically on transient Unavailable errors instead of failing immediately. The underlying error wrapping was also fixed to use %w so the error type is preserved through the chain and the retry predicate can actually detect it.

Why?

when a node is replaced, the pod restarts before the network is fully ready. The DB might be reachable within seconds, but the one-shot startup check would fail hard and put the server into CrashLoopBackOff requiring a manual redeploy to recover.

How did you test it?

  • built
  • covered by existing tests

Potential risks

The retry has no expiration (NoInterval), so if the DB is permanently unreachable, the server will block at startup indefinitely rather than crashing

  • Every failed attempt logs a Warn

#8202

@gurudatta-patil gurudatta-patil requested review from a team as code owners April 8, 2026 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant