Add note about istio-cni security #16749

keithmattix · 2025-08-04T21:33:49Z

Description

Add note about Istio CNI's security implications in ambient mode

Reviewers

Signed-off-by: Keith Mattix II <[email protected]>

content/en/docs/ops/deployment/security-model/index.md

linsun · 2025-08-05T02:34:06Z

Left 2 comments that may be worthwhile pointing out. Otherwise LGTM.

Signed-off-by: Keith Mattix II <[email protected]>

keithmattix · 2025-08-07T22:47:33Z

@craigbox @linsun Updated the doc; PTAL when you get a chance

Signed-off-by: Keith Mattix II <[email protected]>

content/en/docs/ops/deployment/security-model/index.md

craigbox · 2025-08-11T09:55:34Z

content/en/docs/ops/deployment/security-model/index.md


+#### Ambient Mode
+
+In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node. Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if using node cordons + taints as described [in our upgrade documentation](/docs/ambient/upgrade/helm#cni-node-agent). In these rare cases (e.g. on node restart or new node scale out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted. The Istio community is working with [various](https://github.com/containernetworking/cni/pull/1052) [upstream](https://github.com/kubernetes/kubernetes/issues/130594) communities to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/jaellio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions.


Suggested change

In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node. Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if using node cordons + taints as described [in our upgrade documentation](/docs/ambient/upgrade/helm#cni-node-agent). In these rare cases (e.g. on node restart or new node scale out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted. The Istio community is working with [various](https://github.com/containernetworking/cni/pull/1052) [upstream](https://github.com/kubernetes/kubernetes/issues/130594) communities to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/jaellio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions.

In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node.

Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if [using node cordons and taints](/docs/ambient/upgrade/helm#cni-node-agent). In rare cases (e.g. on node restart or new node scale-out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted.

The Istio community is working with [the CNI](https://github.com/containernetworking/cni/pull/1052) and [Kubernetes communities](https://github.com/kubernetes/kubernetes/issues/130594) to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/istio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions.

Reformatted a little for clarity. I wouldn't have thought a pod restart was required anywhere along the way, but one is mentioned? (In the context of the node taint controller, @ilrudie told me that it was only a problem until the CNI agent claimed the pod)

p.s. please fast-follow by documenting that owned CNI mode feature somewhere other than a relnote!

Yeah unfortunately a pod restart is required due to the nature of the bug 😓

Where should we document an env var based feature like this?

Would we really not handle all existing pods when the CNI starts? We used to be able to do that.

What are you thoughts on including this update in 1.27.1 after we address the additional bugs above and have an alternative solution?

This sounds good to me - I wasn't aware/didn't fully understand the reconciliation aspect.

Where should we document an env var based feature like this?

Somewhere in the ambient documentation relating to the CNI.
Perhaps https://istio.io/latest/docs/ambient/architecture/traffic-redirection/ for now?

We need to separate out the page under the "sidecar" section at some point

@jaellio, that makes sense. I thought we did reconcile after installing the binary but if we don't then I can totally see how we'd sometimes wind up in limbo like this.

Thanks for the explanation.

it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if using node cordons + taints as described [in our upgrade documentation]

The CNI node agent upgrade section notes that the agent includes mechanisms to prevent pod scheduling issues and steps to prevent unsecured traffic leakage. We may also want to add a brief note covering this edge case.

...and while being upgraded or restarted will prevent new pods from being started on that node until an instance of the agent is available on the node to handle them, in order to prevent unsecured traffic leakage.

Good call; we should definitely update that since restarts are the Wild West

Co-authored-by: Craig Box <[email protected]>

keithmattix · 2025-08-26T19:13:33Z

@jaellio When you get a chance, can you let me know how/when to proceed with this

sridhargaddam · 2025-09-18T10:58:39Z

meaning that policies won't be enforced until after the CNI comes up and that pod is restarted.

@keithmattix @jaellio If we run into such an issue, is there a straightforward way to identify the pods where traffic is being bypassed? For example, if we run istioctl zc workloads, would the affected pods show HBONE or TCP?

keithmattix · 2025-09-18T12:56:28Z

@sridhargaddam it won't be based on config; it would be a lack of iptables rules in the pod netns. @jaellio can correct me if I'm wrong

sridhargaddam · 2025-09-19T12:28:53Z

@sridhargaddam it won't be based on config; it would be a lack of iptables rules in the pod netns. @jaellio can correct me if I'm wrong

It’d be useful to have a command that shows pods marked for the mesh but bypassing it (due to missing iptables rules). Maybe istioctl analyze... could be extended with an option to catch and report this situation?

keithmattix · 2025-09-19T17:19:04Z

Hmm that could work if the person running the command had kubectl debug permission

sridhargaddam · 2025-09-22T17:37:27Z

it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted.

I don’t expect this issue to occur in sidecar mode with istio-cni (since the istio-validation container checks that traffic redirection rules are in place). Can someone please confirm?

keithmattix · 2025-09-22T19:41:09Z

Correct, the istio-validation init container prevents this from happening in sidecar mode

sridhargaddam · 2025-09-23T06:08:25Z

Correct, the istio-validation init container prevents this from happening in sidecar mode

Cool, thanks for confirming, Keith.

Add note about istio-cni security

3cb4721

Signed-off-by: Keith Mattix II <[email protected]>

keithmattix requested a review from a team as a code owner August 4, 2025 21:33

keithmattix requested a review from jaellio August 4, 2025 21:33

istio-testing added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Aug 4, 2025

keithmattix requested a review from howardjohn August 4, 2025 21:33

craigbox reviewed Aug 4, 2025

View reviewed changes

content/en/docs/ops/deployment/security-model/index.md Outdated Show resolved Hide resolved

linsun reviewed Aug 5, 2025

View reviewed changes

content/en/docs/ops/deployment/security-model/index.md Outdated Show resolved Hide resolved

Remove references to initcontainer

cd0a1d5

Signed-off-by: Keith Mattix II <[email protected]>

keithmattix requested review from craigbox and linsun August 8, 2025 22:39

Fix lint

0313630

Signed-off-by: Keith Mattix II <[email protected]>

craigbox reviewed Aug 11, 2025

View reviewed changes

Update content/en/docs/ops/deployment/security-model/index.md

0ccbb9f

Co-authored-by: Craig Box <[email protected]>


		#### Ambient Mode

		In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node. Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if using node cordons + taints as described [in our upgrade documentation](/docs/ambient/upgrade/helm#cni-node-agent). In these rare cases (e.g. on node restart or new node scale out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted. The Istio community is working with [various](https://github.com/containernetworking/cni/pull/1052) [upstream](https://github.com/kubernetes/kubernetes/issues/130594) communities to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/jaellio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions.

Add note about istio-cni security #16749

Are you sure you want to change the base?

Add note about istio-cni security #16749

Uh oh!

Conversation

keithmattix commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Reviewers

Uh oh!

Uh oh!

Uh oh!

linsun commented Aug 5, 2025

Uh oh!

keithmattix commented Aug 7, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keithmattix commented Aug 26, 2025

Uh oh!

sridhargaddam commented Sep 18, 2025

Uh oh!

keithmattix commented Sep 18, 2025

Uh oh!

sridhargaddam commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keithmattix commented Sep 19, 2025

Uh oh!

sridhargaddam commented Sep 22, 2025

Uh oh!

keithmattix commented Sep 22, 2025

Uh oh!

sridhargaddam commented Sep 23, 2025

Uh oh!

Uh oh!

keithmattix commented Aug 4, 2025 •

edited

Loading

sridhargaddam commented Sep 19, 2025 •

edited

Loading