Support exemplars in WSGI and ASGI histogram #3739

felix-hilden · 2025-09-07T07:50:23Z

Description

I'm attempting to fix exemplar support for WSGI and ASGI applications by moving the metric recording to happen within the active span. This will eliminate the need for similar custom middleware metrics that do emit exemplars out of the box already.

Fixes #3031

Type of change

I'm not sure whether this is an omission, or something that was deliberately not developed yet. It has not been explicitly advertised, but I would certainly expect it to behave like this.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Test A

Does This PR Require a Core Repo Change?

Yes. - Link to PR:
No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

linux-foundation-easycla · 2025-09-07T07:50:26Z

The committers listed above are authorized under a signed CLA.

✅ login: Chromatius / name: Martin Lundholm (06879c1, 929e608)
✅ login: felix-hilden / name: Felix Hildén (2eb4d26, 7fdc0c0, ab3d715)

Chromatius · 2025-09-10T09:41:47Z

I recently came across this issue in the FastAPI instrumentation and tracked it down to the underlying ASGI instrumentation, though it seems to be present in WSGI and Tornado as well as mentioned in the issue. Just thought I would chip in here a bit.

I see two possible solutions:

Option A: defer closing the span until metrics are recorded

This would be what you have started on for WSGI and seems like the more intuitive and clean method. The span end time will become a bit longer with this method but it should be negligible.

Option B: re-create span context after closing the span and pass it to metrics

finally:
    ...
    # Create span context for exemplars
    span_ctx = set_span_in_context(span)
    
    if self.duration_histogram_old:
        duration_attrs_old = _parse_duration_attrs(
            req_attrs, _StabilityMode.DEFAULT, context=span_ctx,
        )
   ...

This is a more minimal and local patch but less clean.

I'd be happy to chip in if you need any assistance with the PR.

felix-hilden · 2025-09-10T17:47:17Z

Thanks for the help! I properly linked the issue now if there's more discussion here. Like I said in the ticket, I'm not sure if the span context is the issue (unless I have misunderstood something), but let's continue discussing there 🙏 I'd love another set of eyes.

Chromatius · 2025-09-12T12:21:31Z

I made a PR felix-hilden#1 with my proposed change into your branch including some tests as well. The diff is a bit large since it's based on a more recent version of the upstream repo.

tammy-baylis-swi · 2025-11-07T23:24:13Z

Hi @felix-hilden and @Chromatius , thanks for creating this! Are either of you still working on this to go from WIP to Ready?

I faced a similar issue with a FastAPI app, in particular the missing exemplars for http.server.request.duration. When I applied the changes in this PR to AsgiInstrumentor in my local setup, it fixed things!

otel-collector debug for reference

``` Metric #1 Descriptor: -> Name: http.server.request.duration -> Description: Duration of HTTP server requests. -> Unit: s -> DataType: ExponentialHistogram -> AggregationTemporality: Delta ExponentialHistogramDataPoints #0 Data point attributes: -> url.scheme: Str(http) -> network.protocol.version: Str(1.1) -> http.request.method: Str(GET) -> http.route: Str(/latency/) -> http.response.status_code: Int(200) StartTimestamp: 2025-11-07 23:12:16.852295758 +0000 UTC Timestamp: 2025-11-07 23:12:26.860803262 +0000 UTC Count: 5 Sum: 1.119976 Min: 0.152042 Max: 0.302494 Bucket [-1.010889, -1.000000), Count: 0 Bucket (0.151906, 0.153560], Count: 1 Bucket (0.153560, 0.155232], Count: 0 Bucket (0.155232, 0.156923], Count: 0 ... Exemplars: Exemplar #0 -> Trace ID: 87672a88b4077d5d8a93e7575a66f456 -> Span ID: 2570bb41a5115383 -> Timestamp: 2025-11-07 23:12:16.909593758 +0000 UTC -> Value: 0.270580 Exemplar #1 -> Trace ID: c7410a16f8bfeafa98555181db29f29b -> Span ID: 5f597c83aaae16c5 -> Timestamp: 2025-11-07 23:12:19.284033092 +0000 UTC -> Value: 0.302494 Exemplar #2 -> Trace ID: 0904ec56214e0bf09df9e127aaa8e9fe -> Span ID: da49599715c86831 -> Timestamp: 2025-11-07 23:12:21.503676343 +0000 UTC -> Value: 0.152042 ```

tammy-baylis-swi · 2025-11-24T20:27:01Z

Awesome! Please could you check the failing unit tests in the CI/CD. Please also do a tox -e ruff on your local and commit the lint/format fixes.

tammy-baylis-swi · 2025-12-01T22:52:04Z

...tation/opentelemetry-instrumentation-wsgi/src/opentelemetry/instrumentation/wsgi/__init__.py

+def _iterate_and_close_with_span(
+    iterable: Iterable[T], span: trace.Span


There are some more test failures that seem to do with missing span attributes. It might be that the span is ending too early.

What about this?:

Change _iterate_and_close_with_span back to _end_span_after_iterating, with the span.end and context.detach call.

Keep the span.end and context.detach in the Except block above. Essentially keep the span ending as before.

But keep everything to do with set_span_in_context because that is the main fix, similar to Fix FlaskInstrumentor exemplars generation for http.server.(request.)duration #3912

EDIT: Also please pull! I resolved some Changelog merge conflicts before triggering the last CI/CD run.

Thanks! Will do, I thought I'd keep the changelog conflict as last if this takes a while still, but let's try not to 😄

Thank you for the pointers too, I'll see today about doing just that now that the test failures are clear 👍 Or if not, proposing something different.

Alright sorry if I'm being a dummy but I want to make sure I understood what your suggestion meant: the initial go I had to try to extend the span was incorrect. Not only unhelpful but also inappropriate, because we can reconstruct the context after the fact and pass it explicitly into the metrics. This I've now reverted!

I confirmed that this fixed the wsgi tests, but for some reason I wasn't able to reproduce the asgi problem in 3.12 or pypy locally. Curious if you have any ideas or experience with how flaky these tests are. But let's try to run once more.

Thanks again for all the help!

Not being a dummy! 😃 Yes, your last change is what I meant 👍 👍

New challenge -- thank you for your patience. The Maintainers just released version 1.39.0/0.60b0 of OTel Python, so more tests are failing now but only because of the dependency conflicts. I won't merge main into your branch myself this time to avoid complicating things, so could you please do that and try again. It looks like that's why so many failed with my trigger of CI/CD just now for yesterday's 95abee7. Re. flaky tests at other times, the odd one that fails citing missing SHA etc just happens sometimes and it's something we're aware of,

One more request: in addition to tox -e ruff, please also run tox -e lint-instrumentation-asgi and tox -e lint-instrumentation-wsgi`. They're all supposed to help with lint/format, but it's not a perfect 1:1.

Done now 👍

Gotta say, I tried to run things locally, but the specific lint runs still ended with the same issue. Test tox envs ran fine at first but after clearing the ".tox" dir, the error popped up.

I've now rebased the branch on the latest main, cleaned up the commit history to make sure there are no weird changes at any point, and force pushed the branch. This had no effect locally, but maybe I'm missing something :/ Could you approve the run once again?

Okay sorry I think it'll be a while.. combining all of "hard for me to reproduce", "very specific", "seemingly unrelated" and "deeply technical project I'm new to" is a bit much 😅 I'll try to set things up on a different OS next and see if it makes a difference.

No good point, it's also hard when you can't trigger the CI/CD. I'll take a look at this and get back to you!

Ok so I triggered re-run of the ASGI test 3 times, and every time it was the same failure at existing test_user_agent_synthetic_test_detection for instrumentation-asgi 3.12 Ubuntu. I think it's two real, existing issues with the tests that only became apparent after adding more coverage. (1) synthetic_test_detection was getting away with not properly resetting test state. (2) PyPy needs more time tolerance for the basic_metric_success assertions.

I posted fixes in 242777b and c106c3c on my own fork. I don't seem to have permissions to make a PR from my fork into yours. Could you try cherry-picking or porting the changes into your branch? Or let me know if another way is better!

Thank you! Okay nice, let me also trace through the changes:

2427: setup is needed for subtests and that's where the base clears the memory exporter too - this I get but two questions: is there merit in calling that in the "subTest" setup? Or can we not assume everyone setting up subtests wants this? Then secondly, there's a third test that does similar things called "test_user_agent_synthetic_bot_detection". Should that be changed too?

c106: durations are counted as milliseconds when passed into those asserts, and because we introduced some processing this started suddenly failing? This I'm not so sure about. I guess we use a delta in both directions not because we expect it to be larger but just for simplicity?

That's weird, but I'll incorporate them here, many thanks again 🙏

"test_user_agent_synthetic_bot_detection". Should that be changed too?

Good point. I think everyone doing "subTest" should get this change.

because we introduced some processing this started suddenly failing

Yes you're right! Response time is going to take a bit longer now because of the added manual context passing. The test change suggests it's going to add up to another 25 ms. To me this is small enough, but wdyt?

felix-hilden marked this pull request as draft September 7, 2025 07:50

felix-hilden force-pushed the 3031-asgi-wsgi-exemplars branch from 26d8015 to dd03138 Compare September 7, 2025 07:52

felix-hilden changed the title ~~WIP: try wsgi exemplar span and token move~~ WIP: Support exemplars in WSGI and ASGI instrumentation Sep 7, 2025

felix-hilden changed the title ~~WIP: Support exemplars in WSGI and ASGI instrumentation~~ WIP: Support exemplars in WSGI and ASGI histogram Sep 7, 2025

felix-hilden force-pushed the 3031-asgi-wsgi-exemplars branch from dd03138 to dba7521 Compare September 13, 2025 06:17

tammy-baylis-swi mentioned this pull request Nov 7, 2025

Fix exemplar support #3031

Open

felix-hilden marked this pull request as ready for review November 22, 2025 09:14

felix-hilden changed the title ~~WIP: Support exemplars in WSGI and ASGI histogram~~ Support exemplars in WSGI and ASGI histogram Nov 22, 2025

felix-hilden requested a review from a team as a code owner November 22, 2025 09:32

tammy-baylis-swi reviewed Dec 1, 2025

View reviewed changes

felix-hilden and others added 3 commits December 4, 2025 19:15

try wsgi fix

2eb4d26

ASGI: Add span context to metrics

929e608

WSGI: add span context to metrics

06879c1

felix-hilden force-pushed the 3031-asgi-wsgi-exemplars branch from e76548b to dac1b53 Compare December 4, 2025 17:20

felix-hilden added 2 commits December 4, 2025 19:24

Typo and lint fixes

7fdc0c0

Add changelog

ab3d715

felix-hilden force-pushed the 3031-asgi-wsgi-exemplars branch from dac1b53 to ab3d715 Compare December 4, 2025 17:24

		def _iterate_and_close_with_span(
		iterable: Iterable[T], span: trace.Span

Support exemplars in WSGI and ASGI histogram #3739

Are you sure you want to change the base?

Support exemplars in WSGI and ASGI histogram #3739

Conversation

felix-hilden commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How Has This Been Tested?

Does This PR Require a Core Repo Change?

Checklist:

Uh oh!

linux-foundation-easycla bot commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Chromatius commented Sep 10, 2025

Option A: defer closing the span until metrics are recorded

Option B: re-create span context after closing the span and pass it to metrics

Uh oh!

felix-hilden commented Sep 10, 2025

Uh oh!

Chromatius commented Sep 12, 2025

Uh oh!

tammy-baylis-swi commented Nov 7, 2025

Uh oh!

tammy-baylis-swi commented Nov 24, 2025

Uh oh!

tammy-baylis-swi Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

felix-hilden commented Sep 7, 2025 •

edited

Loading

linux-foundation-easycla bot commented Sep 7, 2025 •

edited

Loading

tammy-baylis-swi Dec 1, 2025 •

edited

Loading