-
Notifications
You must be signed in to change notification settings - Fork 16.1k
KubernetesPodOperator PodManager retries during create pod on too many requests error #58033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KubernetesPodOperator PodManager retries during create pod on too many requests error #58033
Conversation
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general but have a more generic question which is above the scope of a simple fix... probably. But which might be a good further improvement.
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/utils/pod_manager.py
Outdated
Show resolved
Hide resolved
227ad0f to
2114ae7
Compare
|
@jscheffl I switched back to using Tenacity for retries, since retryhttp only handles HTTP exceptions, while the Kubernetes client library can raise various exception types. |
providers/cncf/kubernetes/tests/unit/cncf/kubernetes/utils/test_pod_manager.py
Outdated
Show resolved
Hide resolved
ea19e34 to
c22e36a
Compare
…y requests error (apache#58033) * Retry create pod also on too many requests issue * Fix unit test * fix static checks --------- Co-authored-by: AutomationDev85 <AutomationDev85>
…y requests error (apache#58033) * Retry create pod also on too many requests issue * Fix unit test * fix static checks --------- Co-authored-by: AutomationDev85 <AutomationDev85>
…y requests error (apache#58033) * Retry create pod also on too many requests issue * Fix unit test * fix static checks --------- Co-authored-by: AutomationDev85 <AutomationDev85>
Overview
We are sporadically encountering "Too Many Requests" (HTTP 429) errors from the Kubernetes API when scaling up nodes in our Kubernetes cluster. While most PodManager functions already implement retries for various errors, the create_pod function previously only retried on HTTP 409 (Conflict) errors.
With this change, the retry logic is extended to also handle HTTP 429 errors, improving robustness during cluster scaling operations.
We welcome your feedback on this change!
Details of change: