webgpu: Implement TimerQueries #8880

jc3265 · 2025-06-17T00:43:01Z

BUGS=[407961733]

filament/backend/src/webgpu/WebGPUTimerQueries.h

AndyHovingh · 2025-06-17T14:10:52Z

filament/backend/src/webgpu/WebGPUTimerQueries.h

+class WebGPUTimerQueries : public HwTimerQuery {
+public:
+    WebGPUTimerQueries()
+        : status(std::make_shared<Status>()) {}


I recommend using unitary/list initializers (curly braces), as opposed to the parentheses syntax whenever you can, as it can avoid certain narrowing conversion issues and is more consistent syntax across the language, e.g.:

WebGPUTimerQuery() : mStatus{ std::make_shared<Status>() } {}

AndyHovingh · 2025-06-17T14:12:18Z

filament/backend/src/webgpu/WebGPUTimerQueries.h

+
+    void beginTimeElapsedQuery();
+    void endTimeElapsedQuery();
+    bool getQueryResult(uint64_t* outElapsedTimeNanoseconds);


I think Time in the name is redundant right? Perhaps uint64_t* outElapsedNanoseconds?

AndyHovingh · 2025-06-17T14:14:33Z

filament/backend/src/webgpu/WebGPUTimerQueries.h

+        std::atomic<uint64_t> previousElapsed{ 0 };
+    };
+
+    std::shared_ptr<Status> status;


Why is the Status struct or shared_ptr really needed? could we not just have?:

std::atomic<uint64_t> mElapsedNanoseconds{ 0 }; std::atomic<uint64_t> mPreviousElapsed{ 0 };

Or do we need to ensure these 2 get updated together, in which case, something like?:

struct Status { uint64_t elapsedNanoseconds{ 0 }; uint64_t previousElapsed{ 0 }; } // and... std::atomic<Status> mStatus{ {} };

filament/backend/src/webgpu/WebGPUDriver.cpp

AndyHovingh · 2025-06-17T14:20:56Z

filament/backend/src/webgpu/WebGPUTimerQueries.h

+
+    void beginTimeElapsedQuery();
+    void endTimeElapsedQuery();
+    bool getQueryResult(uint64_t* outElapsedTimeNanoseconds);


I recommend [[nodiscard]]

filament/backend/src/webgpu/WebGPUDriver.h

filament/backend/src/webgpu/WebGPUTimerQueries.cpp

AndyHovingh · 2025-06-17T14:29:06Z

filament/backend/src/webgpu/WebGPUTimerQueries.cpp

+namespace filament::backend {
+
+void WebGPUTimerQueries::beginTimeElapsedQuery() {
+    status->elapsedNanoseconds = 0;


why would we lock status below and not here?

idk if its needed since we are resetting the value to 0
but i pretty much just got it from https://github.com/google/filament/blob/main/filament/backend/src/metal/MetalTimerQuery.mm#L29

But Metal's implementation is a bit different than this. It is setting some state and then later interacting with that state in a callback (fence->onSignal(...) {...});). In that callback the state is being locked. That code is being defensive about the scenario that the MetalTimerQueryFence is destroyed before the callback is invoked. However, in this WebGPU implementation we are just implementing everything synchronously... the owning class should not be disappearing mid-function call. Thus, I don't think the Metal example is applicable here. It seems to me this could be implemented something like:

void WebGPUTimerQueries::beginTimeElapsedQuery() { status->elapsedNanoseconds = std::chrono::steady_clock::now().time_since_epoch().count(); }

AndyHovingh · 2025-06-17T14:33:28Z

filament/backend/src/webgpu/WebGPUTimerQueries.cpp

+    std::weak_ptr<WebGPUTimerQueries::Status> statusPtr = status;
+
+    if (auto s = statusPtr.lock()) {
+        s->elapsedNanoseconds = std::chrono::steady_clock::now().time_since_epoch().count();


The naming of elapsedNanonseconds becomes confusing at this point, as it is not "elapsed" since a meaningful point in time, but rather more of a timestamp of when the query began right? In that case, should this be more like beganNanoseconds?

yes and no, because then at endTimeElapsedQuery we would have something like

elapsed = current - began

so its just for simplicity and to avoid using an extra variable

Could we not have the same number of variables by just having beganNanoseconds and elapsedNanoseconds?

we currently only have elapsed?

Are those 2 numbers not sufficient to implement your interface?

AndyHovingh · 2025-06-17T14:34:49Z

filament/backend/src/webgpu/WebGPUTimerQueries.cpp

+void WebGPUTimerQueries::endTimeElapsedQuery() {
+    // Capture the timer query status via a weak_ptr because the WGPUTimerQuery could be destroyed
+    // before the block executes.
+    if (status->elapsedNanoseconds != 0) {


again, why is accessing status here fine and not below?

AndyHovingh · 2025-06-17T14:40:51Z

filament/backend/src/webgpu/WebGPUTimerQueries.cpp

+    if (status->elapsedNanoseconds != 0) {
+        std::weak_ptr<WebGPUTimerQueries::Status> statusPtr = status;
+        if (auto s = statusPtr.lock()) {
+            s->previousElapsed = s->elapsedNanoseconds =


The meaning of variables seem to change meaning. Now elapsedNanoseconds does mean elapsed nanoseconds since beginning the query, but before it was since epoch (more of a timestamp). And why not include Nanoseconds in the name of previous...? I would also suggest breaking the statement into separate assignments for readability.

AndyHovingh · 2025-06-17T14:44:44Z

filament/backend/src/webgpu/WebGPUTimerQueries.cpp

+    }
+    if (outElapsedTime) {
+        *outElapsedTime = status->previousElapsed;
+        status->previousElapsed = 0;


The naming of this function is misleading, because getX() typically indicates a getter, which is not expected to have side effects or be computationally expensive in any reasonable way. However, this has side effects; thus, I would indicate that in the function name somehow.

its consistent with the existing Filament implementations
https://github.com/google/filament/blob/main/filament/backend/src/metal/MetalTimerQuery.mm#L59
I believe we should leave it as is

I would make the same argument about the Metal implementation name. Why not getQueryResultAndReset?

AndyHovingh · 2025-06-17T14:51:09Z

filament/backend/src/webgpu/WebGPUDriver.cpp

@@ -268,7 +266,29 @@ Handle<HwFence> WebGPUDriver::createFenceS() noexcept {
 }

 Handle<HwTimerQuery> WebGPUDriver::createTimerQueryS() noexcept {
-    return Handle<HwTimerQuery>((Handle<HwTimerQuery>::HandleId) mNextFakeHandle++);
+    return allocAndConstructHandle<WebGPUTimerQueries, HwTimerQuery>();


why not just follow the pattern as the other resources? Just allocate the handle here and construct it in the createTimerQueryR function?

following the implementation that the other drivers had

filament/backend/src/webgpu/WebGPUDriver.cpp

AndyHovingh · 2025-06-17T14:53:47Z

filament/backend/src/webgpu/WebGPUDriver.cpp

+
+void WebGPUDriver::createTimerQueryR(Handle<HwTimerQuery> timerQueryHandle, int) {}
+
+void WebGPUDriver::destroyTimerQuery(Handle<HwTimerQuery> timerQueryHandle) {


I actually think moving this down here makes navigating the functions even more "difficult", because we are not consistent about it. I suggest rearranging them all at once in a PR to keep things consistent and easy to find.

we could also just do it little by little
I dont think that it makes navigation more difficult, specially for Timer Queries

That's why I put double quotes around "difficult", because it is not particularly so, but I am just pointing out that I'm not a fan of the inconsistency.

AndyHovingh · 2025-06-17T14:55:50Z

filament/backend/src/webgpu/WebGPUDriver.cpp

+    }
+}
+
+TimerQueryResult WebGPUDriver::getTimerQueryValue(Handle<HwTimerQuery> timerQueryHandle, uint64_t* elapsedTime) {


nit: rename elapsedTime to outElapsedTime to indicate it as an output parameter.

elapsedTime matches DriverApi.inc

DECL_DRIVER_API_SYNCHRONOUS_N(backend::TimerQueryResult, getTimerQueryValue, backend::TimerQueryHandle, query, uint64_t*, elapsedTime)```

We routinely use slightly different names for added clarity. For example, in the same signature we use timerQueryHandle instead of query from DriverApi.inc.

AndyHovingh · 2025-06-17T14:57:39Z

filament/backend/src/webgpu/WebGPUDriver.cpp

+    }
+}
+
+TimerQueryResult WebGPUDriver::getTimerQueryValue(Handle<HwTimerQuery> timerQueryHandle, uint64_t* elapsedTime) {


nit: I would make the pointer const (but, not what it points to, as it is an output param), e.g. uint64_t* const outElapsedTime.

AndyHovingh · 2025-06-17T14:58:14Z

filament/backend/src/webgpu/WebGPUTimerQueries.cpp

+    }
+}
+
+bool WebGPUTimerQueries::getQueryResult(uint64_t* outElapsedTime) {


nit: I would make the pointer const (but, not what it points to, as it is an output param), e.g. uint64_t* const outElapsedTime

AndyHovingh · 2025-06-17T15:03:24Z

filament/backend/src/webgpu/WebGPUDriver.cpp

+
+TimerQueryResult WebGPUDriver::getTimerQueryValue(Handle<HwTimerQuery> timerQueryHandle, uint64_t* elapsedTime) {
+    auto* tq = handleCast<WebGPUTimerQueries>(timerQueryHandle);
+    return tq->getQueryResult(elapsedTime) ? TimerQueryResult::AVAILABLE


Why not have the WebGPUTimerQuery's function return a TimerQueryResult instead of a bool?

this ignores the error case. We should probably be returning TimerQueryResult::ERROR to Filament in that case right (instead of reporting to Filament NOT_READY in the case of an error)?

filament/backend/src/webgpu/WebGPUDriver.cpp

AndyHovingh · 2025-06-17T15:30:57Z

filament/backend/src/webgpu/WebGPUDriver.cpp

+                        mTimerQuery->endTimeElapsedQuery();
+                    }
+                }
+            });
        const wgpu::Instance instance = mAdapter.GetInstance();


why are these lines indented over?

seems like Clion thought that they should be
I can either leave the automatic formatting or manually change it

I just tried reformat with CLion and it moves these lines back to the left where they belong, e.g.:

auto f = mQueue.OnSubmittedWorkDone(... const wgpu::Instance instance = mAdapter.GetInstance(); auto wStatus = ... ...

AndyHovingh · 2025-06-17T15:36:46Z

filament/backend/src/webgpu/WebGPUDriver.cpp

-    if (firstRender) {
-        auto f = mQueue.OnSubmittedWorkDone(wgpu::CallbackMode::WaitAnyOnly,
-                [=](wgpu::QueueWorkDoneStatus) {});
+    auto f = mQueue.OnSubmittedWorkDone(wgpu::CallbackMode::WaitAnyOnly,


I believe we can avoid having this be blocking on EACH frame by setting the callbackMode here to wgpu::CallbackMode::AllowSpontaneous and updating WebGPUDriver::tick to invoke the instance's ProcessEvents():

void WebGPUDriver::tick(const int /* dummy */) { mDevice.Tick(); mAdapter.GetInstance().ProcessEvents(); }

We still will want this to be blocking on the first frame due to the magenta flashing issue, but subsequently I think we can do it asynchronously.

AndyHovingh

See my suggested change comments, namely on not blocking the driver's commit call.

AndyHovingh · 2025-06-17T16:28:01Z

filament/backend/src/webgpu/WebGPUDriver.cpp

+            [=](wgpu::QueueWorkDoneStatus status) {
+                if (status == wgpu::QueueWorkDoneStatus::Success) {
+                    if (mTimerQuery) {
+                        mTimerQuery->endTimeElapsedQuery();


Per our conversation offline, I think this may not be accurate enough for our purposes, as it depends on when we call ProcessEvents (the driver's tick call). Unfortunately, I think we may need to take a different approach of adding up render/compute pass times.

jc3265 added internal Issue/PR does not affect clients webgpu Issues/features for WebGPU backend labels Jun 17, 2025

jc3265 force-pushed the jc/enableTQueries branch 2 times, most recently from faab06b to 3d77b23 Compare June 17, 2025 00:59

jc3265 marked this pull request as ready for review June 17, 2025 01:02

jc3265 requested review from AndyHovingh, bridgewaterrobbie, kpiascik and jneljneljnel June 17, 2025 01:03