Skip to content

Conversation

@AutomationDev85
Copy link
Contributor

Overview

We are sporadically encountering "Too Many Requests" (HTTP 429) errors from the Kubernetes API when scaling up nodes in our Kubernetes cluster. While most PodManager functions already implement retries for various errors, the create_pod function previously only retried on HTTP 409 (Conflict) errors.
With this change, the retry logic is extended to also handle HTTP 429 errors, improving robustness during cluster scaling operations.

We welcome your feedback on this change!

Details of change:

  • The create_pod function now retries on both HTTP 409 and HTTP 429 errors.

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general but have a more generic question which is above the scope of a simple fix... probably. But which might be a good further improvement.

@AutomationDev85 AutomationDev85 force-pushed the feature/pod-manager-retry-during-pod-creation branch 2 times, most recently from 227ad0f to 2114ae7 Compare November 14, 2025 13:15
@AutomationDev85
Copy link
Contributor Author

@jscheffl I switched back to using Tenacity for retries, since retryhttp only handles HTTP exceptions, while the Kubernetes client library can raise various exception types.
The retry logic now extracts the Retry-After value from the headers of 429 exceptions, ensuring proper waiting time when too many request issue was detected.

@AutomationDev85 AutomationDev85 force-pushed the feature/pod-manager-retry-during-pod-creation branch from ea19e34 to c22e36a Compare November 17, 2025 06:23
@jscheffl jscheffl merged commit 2d30586 into apache:main Nov 17, 2025
94 checks passed
aaron-wolmutt pushed a commit to aaron-wolmutt/airflow that referenced this pull request Nov 20, 2025
…y requests error (apache#58033)

* Retry create pod also on too many requests issue

* Fix unit test

* fix static checks

---------

Co-authored-by: AutomationDev85 <AutomationDev85>
Copilot AI pushed a commit to jason810496/airflow that referenced this pull request Dec 5, 2025
…y requests error (apache#58033)

* Retry create pod also on too many requests issue

* Fix unit test

* fix static checks

---------

Co-authored-by: AutomationDev85 <AutomationDev85>
itayweb pushed a commit to itayweb/airflow that referenced this pull request Dec 6, 2025
…y requests error (apache#58033)

* Retry create pod also on too many requests issue

* Fix unit test

* fix static checks

---------

Co-authored-by: AutomationDev85 <AutomationDev85>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants