-
Notifications
You must be signed in to change notification settings - Fork 482
Description
What happened?
I originally encountered this issue on k3d/k3s, specifically v1.32.6+k3s1 which included kube-router 2.5.0. When deploying istio with the istio-cni in ambient mode, Istio manages two ipsets used for healthprobe traffic. I experience intermittent behavior where these ipsets would unexpectedly be missing certain pod IPs, despite istio's logs indicating them being added.
After some testing I identified that the behavior disappeared after disabling network policy within k3s (the component that uses kube-router). With the proper log level I was able to see this in kube-router logs:
I0924 15:40:28.817837 27 policy.go:83] Attempting to attain ipset mutex lock
I0924 15:40:28.817840 27 policy.go:85] Attained ipset mutex lock, continuing...
I0924 15:40:28.818554 27 ipset.go:595] ipset (ipv6? false) restore looks like:
create TMP-HHYMGYLRMJXZ3ET7 hash:ip family inet hashsize 1024 maxelem 65536 comment
flush TMP-HHYMGYLRMJXZ3ET7
create istio-inpod-probes-v4 hash:ip family inet hashsize 1024 maxelem 65536 comment
swap TMP-HHYMGYLRMJXZ3ET7 istio-inpod-probes-v4
flush TMP-HHYMGYLRMJXZ3ET7
create TMP-PTNDKBUZZRYGF25Q hash:ip family inet6 hashsize 1024 maxelem 65536 comment
flush TMP-PTNDKBUZZRYGF25Q
create istio-inpod-probes-v6 hash:ip family inet6 hashsize 1024 maxelem 65536 comment
swap TMP-PTNDKBUZZRYGF25Q istio-inpod-probes-v6
flush TMP-PTNDKBUZZRYGF25Q
destroy TMP-HHYMGYLRMJXZ3ET7
destroy TMP-PTNDKBUZZRYGF25Q
I0924 15:40:28.819189 27 policy.go:183] Restoring IPv4 ipset took 657.136µs
As best I can tell there seems to be a race condition where kube-router's save/restore is racing with istio-cni adding an IP to its sets, resulting in inconsistent behavior where IPs go missing.
What did you expect to happen?
kube-router to not affect other ipsets present on my node.
How can we reproduce the behavior you experienced?
An exact reproduction of my issue is in this github gist: https://gist.github.com/mjnagel/d790315dc93553a2853830e22675b71a
This involves:
- make a k3d cluster with the latest 1.33.x image
- istioctl install with proper ambient/cni values for k3s/k3d
- apply a strict peerauth so that probes get blocked if they are proxied through ztunnel
- apply a namespace in ambient mode + a deployment with a bunch of replicas to reproduce the issue faster
System Information (please complete the following information)
- Kube-Router Version (
kube-router --version
): 2.5.0, consumed through k3s v1.32.6+k3s1 - Kube-Router Parameters: unsure - this is consumed through k3s
- Kubernetes Version (
kubectl version
) : v1.32.6 - Cloud Type: local machine
- Kubernetes Deployment Type: k3s
- Kube-Router Deployment Type: part of k3s
- Cluster Size: 1 node
Logs, other output, metrics
Please provide logs, other kind of output or observed metrics here.
Additional context
Add any other context about the problem here.