Skip to content

Http.Resilience sometimes retries too quickly (e.g. with Retry-After headers) #7001

@gearset-joe-ls

Description

@gearset-joe-ls

Description

Hello, using the code below I've observed that:

  • retries can happen up to 11ms before the time given by a Retry-After header;
  • roughly 10% of retries are sent early in absolute terms; and
  • roughly 1% are sent early enough to plausibly cause a rate-limit violation in the real world (taking latency into account);

I can reproduce the effect with the following variations:

  • building in Release or Debug configuration;
  • setting Retry.UseJitter to true or false;
  • using a Retry.DelayGenerator (it's a bit more subjective when dealing with time spans instead of dates, but I believe the effect is still measurable); and
  • using a fake HttpMessageHandler to measure the effect entirely on the client side.

Perhaps there's a mistake in my understanding or in the experiment.

Reproduction Steps

using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.AspNetCore.Hosting.Server;
using Microsoft.AspNetCore.Hosting.Server.Features;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;

var retryAfter = DateTimeOffset.UtcNow.ToString("R");
Console.WriteLine("Printing actual time minus 'Retry-After' time...");

// Start responding to local HTTP requests
var address = await RespondToHttpAsync(httpContext =>
{
    var time = DateTimeOffset.UtcNow;
    var difference = (time - DateTimeOffset.Parse(retryAfter)).TotalMilliseconds;
    Console.Write($"{difference:N1}ms\t");
    if (difference <= -1)
    {
        Console.WriteLine();
        Console.WriteLine($"Retry attempted {-difference:N1}ms too soon. " +
                      $"Retry-After: <{retryAfter}>. " +
                      $"Actual: <{time:O}>.");
    }

    var nextRetryAfter = (DateTimeOffset.UtcNow + TimeSpan.FromSeconds(1)).ToString("R");
    httpContext.Response.StatusCode = 429;
    httpContext.Response.Headers.RetryAfter = nextRetryAfter;
    retryAfter = nextRetryAfter;
});

// Create an HTTP client with standard resilience
var httpClient = new ServiceCollection()
    .ConfigureHttpClientDefaults(d => d.AddStandardResilienceHandler(r =>
    {
        // These overrides aren't required to observe the effect
        r.Retry.UseJitter = false;
        r.Retry.MaxRetryAttempts = int.MaxValue;
        r.TotalRequestTimeout.Timeout = TimeSpan.FromDays(1);
    }))
    .BuildServiceProvider()
    .GetRequiredService<IHttpClientFactory>()
    .CreateClient();

// Request and retry
await httpClient.GetAsync(address);
return;

static async Task<string> RespondToHttpAsync(Action<HttpContext> requestHandler)
{
    var builder = WebApplication.CreateBuilder();
    builder.Logging.ClearProviders();
    builder.WebHost.UseUrls("http://[::1]:0");
    var app = builder.Build();
    app.Run(c => Task.Run(() => requestHandler(c)));
    await app.StartAsync();
    return app.Services.GetRequiredService<IServer>().Features.Get<IServerAddressesFeature>()?.Addresses.Single()!;
}
<Project Sdk="Microsoft.NET.Sdk">

    <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFramework>net9.0</TargetFramework>
        <ImplicitUsings>enable</ImplicitUsings>
        <Nullable>enable</Nullable>
    </PropertyGroup>

    <ItemGroup>
      <PackageReference Include="Microsoft.Extensions.Http.Resilience" Version="9.10.0" />
    </ItemGroup>
    
    <ItemGroup>
        <FrameworkReference Include="Microsoft.AspNetCore.App" />
    </ItemGroup>
    
</Project>

Expected behavior

Retries should not be attempted before the time given by a Retry-After header (according to RFC 9110).

Actual behavior

Here's some output from running the code above:

Printing actual time minus 'Retry-After' time...
1,190.6ms       12.9ms  10.4ms  9.4ms   12.6ms  7.4ms   10.1ms  12.3ms  8.4ms   5.6ms   7.7ms   3.3ms   11.5ms  4.0ms   15.8ms  13.9ms  10.7ms  8.7ms   12.1ms  6.8ms   0.8ms   13.0ms  0.8ms   2.7ms   13.0ms  0.2ms   7.3ms   3.1ms   11.2ms  3.9ms   1.5ms   3.7ms       7.1ms   4.6ms   15.2ms  7.7ms   10.9ms  2.1ms   7.1ms   3.0ms   1.7ms   8.1ms   7.1ms   -0.1ms  6.7ms   10.0ms  9.8ms   3.2ms   14.1ms  8.7ms   -0.1ms  3.7ms   4.0ms   5.3ms   8.2ms   0.1ms   8.9ms   3.1ms   15.1ms  1.8ms   4.4ms   1.4ms   10.4ms  5.1ms       15.9ms  11.5ms  4.7ms   1.0ms   1.1ms   2.4ms   -1.4ms  
Retry attempted 1.4ms too soon. Retry-After: <Mon, 03 Nov 2025 11:12:29 GMT>. Actual: <2025-11-03T11:12:28.9986413+00:00>.
-1.2ms  
Retry attempted 1.2ms too soon. Retry-After: <Mon, 03 Nov 2025 11:12:30 GMT>. Actual: <2025-11-03T11:12:29.9987961+00:00>.
6.0ms   12.6ms  10.7ms  7.4ms   8.6ms   -0.3ms  -2.4ms  
Retry attempted 2.4ms too soon. Retry-After: <Mon, 03 Nov 2025 11:12:37 GMT>. Actual: <2025-11-03T11:12:36.9976278+00:00>.
1.3ms   9.2ms   2.5ms   8.8ms   7.7ms   0.7ms   6.7ms   0.1ms   10.1ms  1.6ms   0.5ms   1.3ms   2.7ms   4.2ms   11.7ms  9.0ms   1.6ms   7.5ms   7.1ms   11.7ms  12.4ms  5.8ms   12.7ms  8.4ms   0.8ms   4.2ms   0.3ms   10.7ms  3.0ms   15.5ms  4.2ms   -1.9ms  
Retry attempted 1.9ms too soon. Retry-After: <Mon, 03 Nov 2025 11:13:08 GMT>. Actual: <2025-11-03T11:13:07.9980936+00:00>.
14.2ms  6.5ms   5.0ms   1.8ms   14.4ms  5.8ms   11.7ms  11.6ms  -1.2ms  
Retry attempted 1.2ms too soon. Retry-After: <Mon, 03 Nov 2025 11:13:16 GMT>. Actual: <2025-11-03T11:13:15.9988181+00:00>.
0.1ms   3.1ms   14.8ms  8.4ms   3.5ms   7.7ms   9.6ms   4.8ms   6.0ms   11.1ms  14.5ms  12.8ms  8.3ms   5.3ms   11.2ms  0.0ms   13.3ms  10.0ms  8.0ms   3.0ms   -1.4ms  
Retry attempted 1.4ms too soon. Retry-After: <Mon, 03 Nov 2025 11:13:36 GMT>. Actual: <2025-11-03T11:13:35.9985950+00:00>.
9.4ms   10.8ms  12.3ms  8.0ms   13.7ms  8.4ms   0.9ms   11.2ms  3.9ms   2.6ms   -1.0ms  
Retry attempted 1.0ms too soon. Retry-After: <Mon, 03 Nov 2025 11:13:47 GMT>. Actual: <2025-11-03T11:13:46.9989836+00:00>.
-0.1ms  -1.5ms  
Retry attempted 1.5ms too soon. Retry-After: <Mon, 03 Nov 2025 11:13:48 GMT>. Actual: <2025-11-03T11:13:47.9984860+00:00>.
0.4ms   -3.3ms  
Retry attempted 3.3ms too soon. Retry-After: <Mon, 03 Nov 2025 11:13:49 GMT>. Actual: <2025-11-03T11:13:48.9967205+00:00>.
12.3ms  11.7ms  5.9ms   7.4ms   4.5ms   11.6ms  12.0ms  13.0ms  14.5ms  8.7ms   14.1ms  2.6ms   12.0ms  -0.8ms  -0.4ms  0.1ms   3.8ms   8.2ms   1.9ms   8.9ms   5.8ms   13.3ms  
Process finished with exit code -1.

Regression?

Don't know.

Known Workarounds

Use a DelegatingHandler to add extra delay.

Configuration

.NET SDK version: 9.0.305
MSBuild version: 17.14.21+8929ca9e3
Microsoft.Extensions.Http.Resilience version: 9.10.0
OS version: Windows 10.0.26200

Other information

Looks like the delay itself happens in Polly's code, but I've filed the bug here under the (possibly flawed) assumption that Polly is working as designed, and that the bug is more likely to be in the application.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-resiliencebugThis issue describes a behavior which is not expected - a bug.untriaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions