-
Notifications
You must be signed in to change notification settings - Fork 5.1k
http2: send graceful goaways on close #41284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Anurag Aggarwal <[email protected]>
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
Signed-off-by: Anurag Aggarwal <[email protected]>
Signed-off-by: Anurag Aggarwal <[email protected]>
Signed-off-by: Anurag Aggarwal <[email protected]>
/retest |
Signed-off-by: Anurag Aggarwal <[email protected]>
Signed-off-by: Anurag Aggarwal <[email protected]>
Signed-off-by: Anurag Aggarwal <[email protected]>
/retest |
Signed-off-by: Anurag Aggarwal <[email protected]>
My understanding is that a graceful goaway is sent at the beginning of the drain sequence via Connection::shutdownNotice() (check code here) That eventually calls this adapter function which in both nghttp2 and oghttp2 sends the goaway frame with the max stream ID as you have done here. Can you explain why this is not sufficient? |
On connection drain the shutdown notice in both nghttp2/oghttp2 adapter sends goaways correctly. If you look at the original issue for this PR #39876, an important part is filters triggering GoAways immediately. This PR solves that part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry! I missed the issue in the description. Thank you for the explanation.
EXPECT_EQ(1, server_stats_store_.counter("http2.goaway_sent").value()); | ||
} | ||
|
||
TEST_P(Http2CodecImplTest, GracefulGoAwayCausesOutboundFlood) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be prudent to add some tests showing how this new feature and the existing http connection manager drain phase interact.
Could you also add a couple integration tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!, I added some integration tests. Please let me know if they are okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TY for the added testing!
I think what worries me the most is if it's possible for the drain timer to fire after the graceful shutdown timer or vice versa.
Will that crash the envoy? (e.g. the second timer fires and tries calling a function on a now null object).
Could that potentially cause us to send two final goaways (with the second having a higher stream_id-- which is not allowed according to the spec)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand the code (which is little to be honest) -- the destructor on ConnectionImpl
should destroy the timers -> cancelling any pending callbacks -> preventing null object access.
Actually the behavior hasn't changed much. I reverted the change on calling gracefulGoAway
on onDrainTimeout
. Now only filter or load shedding invoked goaways are changed to graceful. Both goaway paths are separate and non overlapping now. It just changes the manual goaway to be more RFC compliant.
However, I will keep attempting adding test.
1. only call custom gracefulGoAway when triggered from LoadSheddding or Filter 2. Add UTs 3. Add integration tests showing interaction of connection drain and filter invoked Signed-off-by: Anurag Aggarwal <[email protected]>
Signed-off-by: Anurag Aggarwal <[email protected]>
Signed-off-by: Anurag Aggarwal <[email protected]>
While trying to fix loadShedding integration tests, I see that envoy allows new streams to be created when they shouldn't be during the GOAWAY period. See ServerConnectionImpl::onBeginHeaders(int32_t stream_id). This violates RFC 7540. |
Signed-off-by: Anurag Aggarwal <[email protected]>
Signed-off-by: Anurag Aggarwal <[email protected]>
https://github.com/envoyproxy/envoy/blob/main/envoy/http/codec.h#L599 This comment seems to imply this is working as intended, but I agree the RFC seems to say this shouldn't be allowed. |
Thanks for the reference. I see the behavior with nghttp2 vs oghttp2 different.
I think this makes the behaviour already inconsistent irrespective what we expect according to the comment, orz. |
Yes, Envoy does not enforce GOAWAY semantics as it should. We should fix this and make it configurable with default to RFC correct behavior. |
@kanurag94 please resolve merge conflicts. Will wait for @antoniovleonti LGTM. /wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for the improved testing!
If nghttp2 resets new streams after Envoy sends the initial GOAWAY with max stream ID, then that is a bug in nghttp2. A server should continue to accept new streams that arrive until it sends the final GOAWAY. |
I appreciate the work you've done here to fix the issue in question. Would it be possible to offer two options for the inter-GOAWAY timeout?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this great contribution. I added some comments. But I think the core problems are here:
sendGoAwayAndClose()
will reset all pending streams directly. It's make no sense to delay the final goaway. That's say, thesendGoAwayAndClose()
should not be used if graceful shutdown is necessary. I think it's OK to submit a shutdown notice first for RFC. But it makes no much sense in actual practice.- If we need a graceful shutdown, we should enhance the
sendGoAwayAndClose()
to provide a graceful mode, then we can redirect the call to thestartDrainSequence()
. Then the shutdown notice (initial goaway) will be sent immediately. And the final goway will be sent after the drain time is end.
cc @yanavlasov cc @birenroy
/wait
/** | ||
* Indicate graceful "go away" to the remote. For HTTP/2, this implements RFC-compliant | ||
* graceful shutdown. For other protocols, this falls back to regular goAway(). | ||
*/ | ||
virtual void goAwayGraceful() { goAway(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should reuse the shutdownNotice()
.
|
||
// Timeout for graceful HTTP/2 GOAWAY shutdown sequence. | ||
// When graceful shutdown is enabled, Envoy will send an initial GOAWAY frame with | ||
// last_stream_id set to 2^31-1, wait for this timeout period, then send a final | ||
// GOAWAY frame with the actual highest received stream ID. This allows in-flight | ||
// requests to complete gracefully. If set to 0, graceful shutdown is disabled | ||
// and the standard immediate GOAWAY behavior is used. Defaults to 1000ms (1 second). | ||
google.protobuf.Duration graceful_goaway_timeout = 18; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection will be closed directly after the delayed_close_timeout, the that will finally effect the result of the graceful_goaway_timeout
.
Description
With this change envoy will send one GOAWAY frame with a last_stream_id set to 2^31-1, as described in RFC 9113. After a reasonable interval (defaults to 1s), envoy will send a followup GOAWAY frame with last_stream_id set to the highest received stream ID at that point in time.
This interval is configurable using
graceful_goaway_timeout
field.Commit Message: http2: send graceful GoAway on close
Additional Description: According to RFC 9113, envoy should send a goaway with last_stream_id set to 2^31-1 before sending a final goaway with the highest received stream ID.
Risk Level: Low
Testing: Tests added. Manually verified.
Docs Changes: NA
Release Notes: Changelog added
Platform Specific Features:
Runtime guard:
envoy.reloadable_features.http2_graceful_goaway
Fixes #39876
[Optional API Considerations:]