-
Notifications
You must be signed in to change notification settings - Fork 147
feat: implements crl support in zTunnel #1660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Skipping CI for Draft Pull Request. |
|
😊 Welcome @nilekhc! This is either your first contribution to the Istio ztunnel repo, or it's been You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines Thanks for contributing! Courtesy of your friendly welcome wagon. |
|
/test all |
|
I'm a bit confused. It seems there was a design doc that was approved for this (not sure by who, as I was out of office and there are no meeting notes or listed approvers) but the design is completely different than this |
6cfcad3 to
c522f5f
Compare
|
/test all |
Hi @howardjohn, Thanks for the comment. The original design I presented in one of the community meetings received some valuable feedback. Based on that, I pivoted the design to propagate CRL information through a ConfigMap, similar to how the ca-root-cert is propogated. I’ve captured these updates in the same design document, under the Implementation section. This PR simply uses that CRL ConfigMap and validates the request accordingly. |
c522f5f to
ce5886a
Compare
|
/test all |
ce5886a to
2402c4d
Compare
|
/test all |
| socket2::SockRef::from(&s).set_tcp_keepalive(&ka) | ||
| ); | ||
| } | ||
| #[cfg(target_os = "linux")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting an error on macos due to cross compilation. I have to add this to some places to fix it. Please let me know if there is a better way to handle this.
|
/retest |
src/proxy/connection_manager.rs
Outdated
|
|
||
| /// Close all inbound and outbound connections | ||
| /// This is called when certificate revocations are detected | ||
| pub fn close_all_connections(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems really dangerous and bad behavior. How strongly do you feel about keeping this?
Does any other application do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I don't think we should be closing every single connection just because of one revocation. What does Envoy do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason I addd this since http connections are long lived and does not check updated crl. Is there a better way to handle this? I know envoy has some timeout we can configure. Not comepletly sure how zTunnel handles it. Happy to tweak this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition is to do nothing. But at the very least, only close ones that are revoked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing nothing feels wrong IMO; the purpose of a CRL is to be able to break a particular trust chain, and you don't want existing connections to continue. Only closing the ones that are revoked is the right solution to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is restart your clients if you want the CRL to take effect quickly the UX we want then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo, zTunnel should terminate that connection selectively and avoid restarting the workload. Since the user provides/updates the CRL file, they’re already aware of potentially revoked certificates, so why ask them to take action twice?
(1) Create or update the CRL, and
(2) Restart the workload pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing advocate aside, I suspect that most folks would accept a limitation where existing connections are not re-evaluated on a CRL refresh. This seems to be a fairly common behavior. I think I would lean towards breaking support for re-evaluating existing connections into a separate PR (assuming we do want to explore that functionality).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate PR seems reasonable to me so we can evaluate the complexity on its own
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reverted the outbound related changes and now it only selectively terminates inbound connections. PTAL. Thanks
If you'd like to see the conversation where this was reviewed, it's in the 4/14 wg meeting at https://www.youtube.com/watch?v=welIiuqUvn0&t=118s |
Signed-off-by: nilekh <[email protected]>
Signed-off-by: nilekh <[email protected]>
5509cf6 to
37f0ca0
Compare
11a8b0b to
0f855bf
Compare
|
@howardjohn @ilrudie @keithmattix Could you PTAL when you get a chance? Thanks. |
Signed-off-by: nilekh <[email protected]>
0f855bf to
01540f7
Compare
| serial.len(), | ||
| serial | ||
| .iter() | ||
| .map(|b| format!("{:02x}", b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we converting to string here and on L372? it will be quite expensive, we can just compare the bytes directly?
ilrudie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me this bundles functionality that belongs in something like PolicyWatcher into the ConnectionManager. Please take a look at that, specifically how run works and is invoked and works for an example of how similar functionality was implemented.
| let rbac_ctx = ProxyRbacContext { | ||
| conn: conn.clone(), | ||
| dest_workload: destination_workload, | ||
| }; | ||
| let mut tls_guard = pi | ||
| .connection_manager | ||
| .register_tls_connection(&rbac_ctx, client_cert_serials.clone()); | ||
| debug!("TLS connection registered for revocation tracking"); | ||
|
|
||
| // Get watcher BEFORE serving to listen for revocation-based termination | ||
| let tls_drain_watch = tls_guard.watcher(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serv_connect will already create a connection guard which will end the connection when asked. I don't think we need a second connection guard
| // This allows CRL revocation to terminate the connection immediately | ||
| tokio::select! { | ||
| result = serve => { | ||
| tls_guard.release(); | ||
| result | ||
| } | ||
| _ = tls_drain_watch.wait_for_drain() => { | ||
| debug!("TLS connection terminated due to certificate revocation"); | ||
| tls_guard.release(); | ||
| // Return an error to signal abnormal termination | ||
| // This helps with proper cleanup and error reporting | ||
| Err(std::io::Error::new( | ||
| std::io::ErrorKind::ConnectionAborted, | ||
| "connection terminated due to certificate revocation" | ||
| ).into()) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to above, we already take a guard that closes the connection, I don't think we need
This PR extends the work of CRL support by consuming the ca-crl config map in zTunnel and validates the request accordingly.
Fixes #1678