Skip to content

Marking additional std::io::ErrorKind variants as transient (Cloudflare bad TLS packets) #210

@beanow-at-crabnebula

Description

@beanow-at-crabnebula

Motivations

We were investigating flakiness with Cloudflare requests that already had a generous retry limit, but were flagged as Fatal by the default policy.

As it turns out, one of the errors looked like:

reqwest::Error {
	kind: Request,
	url: Url { ... },
	source: hyper_util::client::legacy::Error(SendRequest, hyper::Error(Io, Custom { kind: InvalidData, error: "received fatal alert: BadRecordMac" }))
}

There are various reports of this BadRecordMac (rustls) or ERR_SSL_BAD_RECORD_MAC_ALERT (openssl) when using Cloudflare.
Retrying mitigates the issue, but since it's considered Fatal instead of Transient, the request fails.

Solution

Update classify_io_error to mark this error as transient.

fn classify_io_error(error: &std::io::Error) -> Retryable {
    match error.kind() {
-        std::io::ErrorKind::ConnectionReset | std::io::ErrorKind::ConnectionAborted => {
+        std::io::ErrorKind::ConnectionReset | std::io::ErrorKind::ConnectionAborted | std::io::ErrorKind::InvalidData => {
            Retryable::Transient
        }
        _ => Retryable::Fatal,
    }
}

Alternatives

Consider even more variants to be marked as transient.
I haven't investigated all of them, but some that might be transient from their description:

Additional context

Tested with

  • reqwest-retry 0.7.0
  • reqwest-middleware 0.4.0
  • reqwest 0.12.4 (including rustls-tls-native-roots)
  • hyper 1.3.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions