-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add note about istio-cni security #16749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Keith Mattix II <[email protected]>
Left 2 comments that may be worthwhile pointing out. Otherwise LGTM. |
Signed-off-by: Keith Mattix II <[email protected]>
Signed-off-by: Keith Mattix II <[email protected]>
|
||
#### Ambient Mode | ||
|
||
In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node. Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if using node cordons + taints as described [in our upgrade documentation](/docs/ambient/upgrade/helm#cni-node-agent). In these rare cases (e.g. on node restart or new node scale out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted. The Istio community is working with [various](https://github.com/containernetworking/cni/pull/1052) [upstream](https://github.com/kubernetes/kubernetes/issues/130594) communities to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/jaellio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node. Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if using node cordons + taints as described [in our upgrade documentation](/docs/ambient/upgrade/helm#cni-node-agent). In these rare cases (e.g. on node restart or new node scale out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted. The Istio community is working with [various](https://github.com/containernetworking/cni/pull/1052) [upstream](https://github.com/kubernetes/kubernetes/issues/130594) communities to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/jaellio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions. | |
In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node. | |
Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if [using node cordons and taints](/docs/ambient/upgrade/helm#cni-node-agent). In rare cases (e.g. on node restart or new node scale-out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted. | |
The Istio community is working with [the CNI](https://github.com/containernetworking/cni/pull/1052) and [Kubernetes communities](https://github.com/kubernetes/kubernetes/issues/130594) to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/istio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions. |
Reformatted a little for clarity. I wouldn't have thought a pod restart was required anywhere along the way, but one is mentioned? (In the context of the node taint controller, @ilrudie told me that it was only a problem until the CNI agent claimed the pod)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. please fast-follow by documenting that owned CNI mode feature somewhere other than a relnote!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah unfortunately a pod restart is required due to the nature of the bug 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should we document an env var based feature like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we really not handle all existing pods when the CNI starts? We used to be able to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you thoughts on including this update in 1.27.1 after we address the additional bugs above and have an alternative solution?
This sounds good to me - I wasn't aware/didn't fully understand the reconciliation aspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should we document an env var based feature like this?
Somewhere in the ambient documentation relating to the CNI.
Perhaps https://istio.io/latest/docs/ambient/architecture/traffic-redirection/ for now?
We need to separate out the page under the "sidecar" section at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaellio, that makes sense. I thought we did reconcile after installing the binary but if we don't then I can totally see how we'd sometimes wind up in limbo like this.
Thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if using node cordons + taints as described [in our upgrade documentation]
The CNI node agent upgrade section notes that the agent includes mechanisms to prevent pod scheduling issues and steps to prevent unsecured traffic leakage. We may also want to add a brief note covering this edge case.
...and while being upgraded or restarted will prevent new pods from being started on that node until an instance of the agent is available on the node to handle them, in order to prevent unsecured traffic leakage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call; we should definitely update that since restarts are the Wild West
Co-authored-by: Craig Box <[email protected]>
@jaellio When you get a chance, can you let me know how/when to proceed with this |
@keithmattix @jaellio If we run into such an issue, is there a straightforward way to identify the pods where traffic is being bypassed? For example, if we run |
@sridhargaddam it won't be based on config; it would be a lack of iptables rules in the pod netns. @jaellio can correct me if I'm wrong |
It’d be useful to have a command that shows pods marked for the mesh but bypassing it (due to missing iptables rules). Maybe |
Hmm that could work if the person running the command had kubectl debug permission |
I don’t expect this issue to occur in sidecar mode with istio-cni (since the istio-validation container checks that traffic redirection rules are in place). Can someone please confirm? |
Correct, the istio-validation init container prevents this from happening in sidecar mode |
Cool, thanks for confirming, Keith. |
Description
Add note about Istio CNI's security implications in ambient mode
Reviewers