Skip to content

Conversation

jc3265
Copy link
Contributor

@jc3265 jc3265 commented Jun 17, 2025

BUGS=[407961733]

@jc3265 jc3265 added internal Issue/PR does not affect clients webgpu Issues/features for WebGPU backend labels Jun 17, 2025
@jc3265 jc3265 force-pushed the jc/enableTQueries branch 2 times, most recently from faab06b to 3d77b23 Compare June 17, 2025 00:59
@jc3265 jc3265 marked this pull request as ready for review June 17, 2025 01:02
class WebGPUTimerQueries : public HwTimerQuery {
public:
WebGPUTimerQueries()
: status(std::make_shared<Status>()) {}
Copy link
Contributor

@AndyHovingh AndyHovingh Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using unitary/list initializers (curly braces), as opposed to the parentheses syntax whenever you can, as it can avoid certain narrowing conversion issues and is more consistent syntax across the language, e.g.:

WebGPUTimerQuery()
    : mStatus{ std::make_shared<Status>() } {}


void beginTimeElapsedQuery();
void endTimeElapsedQuery();
bool getQueryResult(uint64_t* outElapsedTimeNanoseconds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Time in the name is redundant right? Perhaps uint64_t* outElapsedNanoseconds?

std::atomic<uint64_t> previousElapsed{ 0 };
};

std::shared_ptr<Status> status;
Copy link
Contributor

@AndyHovingh AndyHovingh Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the Status struct or shared_ptr really needed? could we not just have?:

std::atomic<uint64_t> mElapsedNanoseconds{ 0 };
std::atomic<uint64_t> mPreviousElapsed{ 0 };

Or do we need to ensure these 2 get updated together, in which case, something like?:

struct Status {
  uint64_t elapsedNanoseconds{ 0 };
  uint64_t previousElapsed{ 0 };
}
// and...
std::atomic<Status> mStatus{ {} };


void beginTimeElapsedQuery();
void endTimeElapsedQuery();
bool getQueryResult(uint64_t* outElapsedTimeNanoseconds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend [[nodiscard]]

namespace filament::backend {

void WebGPUTimerQueries::beginTimeElapsedQuery() {
status->elapsedNanoseconds = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we lock status below and not here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if its needed since we are resetting the value to 0
but i pretty much just got it from https://github.com/google/filament/blob/main/filament/backend/src/metal/MetalTimerQuery.mm#L29

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Metal's implementation is a bit different than this. It is setting some state and then later interacting with that state in a callback (fence->onSignal(...) {...});). In that callback the state is being locked. That code is being defensive about the scenario that the MetalTimerQueryFence is destroyed before the callback is invoked. However, in this WebGPU implementation we are just implementing everything synchronously... the owning class should not be disappearing mid-function call. Thus, I don't think the Metal example is applicable here. It seems to me this could be implemented something like:

void WebGPUTimerQueries::beginTimeElapsedQuery() {
  status->elapsedNanoseconds = std::chrono::steady_clock::now().time_since_epoch().count();
}

std::weak_ptr<WebGPUTimerQueries::Status> statusPtr = status;

if (auto s = statusPtr.lock()) {
s->elapsedNanoseconds = std::chrono::steady_clock::now().time_since_epoch().count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of elapsedNanonseconds becomes confusing at this point, as it is not "elapsed" since a meaningful point in time, but rather more of a timestamp of when the query began right? In that case, should this be more like beganNanoseconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and no, because then at endTimeElapsedQuery we would have something like

elapsed = current - began

so its just for simplicity and to avoid using an extra variable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not have the same number of variables by just having beganNanoseconds and elapsedNanoseconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we currently only have elapsed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those 2 numbers not sufficient to implement your interface?

void WebGPUTimerQueries::endTimeElapsedQuery() {
// Capture the timer query status via a weak_ptr because the WGPUTimerQuery could be destroyed
// before the block executes.
if (status->elapsedNanoseconds != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, why is accessing status here fine and not below?

if (status->elapsedNanoseconds != 0) {
std::weak_ptr<WebGPUTimerQueries::Status> statusPtr = status;
if (auto s = statusPtr.lock()) {
s->previousElapsed = s->elapsedNanoseconds =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning of variables seem to change meaning. Now elapsedNanoseconds does mean elapsed nanoseconds since beginning the query, but before it was since epoch (more of a timestamp). And why not include Nanoseconds in the name of previous...? I would also suggest breaking the statement into separate assignments for readability.

}
if (outElapsedTime) {
*outElapsedTime = status->previousElapsed;
status->previousElapsed = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of this function is misleading, because getX() typically indicates a getter, which is not expected to have side effects or be computationally expensive in any reasonable way. However, this has side effects; thus, I would indicate that in the function name somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its consistent with the existing Filament implementations
https://github.com/google/filament/blob/main/filament/backend/src/metal/MetalTimerQuery.mm#L59
I believe we should leave it as is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the same argument about the Metal implementation name. Why not getQueryResultAndReset?

@@ -268,7 +266,29 @@ Handle<HwFence> WebGPUDriver::createFenceS() noexcept {
}

Handle<HwTimerQuery> WebGPUDriver::createTimerQueryS() noexcept {
return Handle<HwTimerQuery>((Handle<HwTimerQuery>::HandleId) mNextFakeHandle++);
return allocAndConstructHandle<WebGPUTimerQueries, HwTimerQuery>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just follow the pattern as the other resources? Just allocate the handle here and construct it in the createTimerQueryR function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following the implementation that the other drivers had


void WebGPUDriver::createTimerQueryR(Handle<HwTimerQuery> timerQueryHandle, int) {}

void WebGPUDriver::destroyTimerQuery(Handle<HwTimerQuery> timerQueryHandle) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think moving this down here makes navigating the functions even more "difficult", because we are not consistent about it. I suggest rearranging them all at once in a PR to keep things consistent and easy to find.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also just do it little by little
I dont think that it makes navigation more difficult, specially for Timer Queries

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I put double quotes around "difficult", because it is not particularly so, but I am just pointing out that I'm not a fan of the inconsistency.

}
}

TimerQueryResult WebGPUDriver::getTimerQueryValue(Handle<HwTimerQuery> timerQueryHandle, uint64_t* elapsedTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename elapsedTime to outElapsedTime to indicate it as an output parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elapsedTime matches DriverApi.inc

DECL_DRIVER_API_SYNCHRONOUS_N(backend::TimerQueryResult, getTimerQueryValue, backend::TimerQueryHandle, query, uint64_t*, elapsedTime)```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We routinely use slightly different names for added clarity. For example, in the same signature we use timerQueryHandle instead of query from DriverApi.inc.

}
}

TimerQueryResult WebGPUDriver::getTimerQueryValue(Handle<HwTimerQuery> timerQueryHandle, uint64_t* elapsedTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would make the pointer const (but, not what it points to, as it is an output param), e.g. uint64_t* const outElapsedTime.

}
}

bool WebGPUTimerQueries::getQueryResult(uint64_t* outElapsedTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would make the pointer const (but, not what it points to, as it is an output param), e.g. uint64_t* const outElapsedTime


TimerQueryResult WebGPUDriver::getTimerQueryValue(Handle<HwTimerQuery> timerQueryHandle, uint64_t* elapsedTime) {
auto* tq = handleCast<WebGPUTimerQueries>(timerQueryHandle);
return tq->getQueryResult(elapsedTime) ? TimerQueryResult::AVAILABLE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have the WebGPUTimerQuery's function return a TimerQueryResult instead of a bool?

  • this ignores the error case. We should probably be returning TimerQueryResult::ERROR to Filament in that case right (instead of reporting to Filament NOT_READY in the case of an error)?

mTimerQuery->endTimeElapsedQuery();
}
}
});
const wgpu::Instance instance = mAdapter.GetInstance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these lines indented over?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like Clion thought that they should be
I can either leave the automatic formatting or manually change it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried reformat with CLion and it moves these lines back to the left where they belong, e.g.:

auto f = mQueue.OnSubmittedWorkDone(...
const wgpu::Instance instance = mAdapter.GetInstance();
auto wStatus = ...
...

if (firstRender) {
auto f = mQueue.OnSubmittedWorkDone(wgpu::CallbackMode::WaitAnyOnly,
[=](wgpu::QueueWorkDoneStatus) {});
auto f = mQueue.OnSubmittedWorkDone(wgpu::CallbackMode::WaitAnyOnly,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can avoid having this be blocking on EACH frame by setting the callbackMode here to wgpu::CallbackMode::AllowSpontaneous and updating WebGPUDriver::tick to invoke the instance's ProcessEvents():

void WebGPUDriver::tick(const int /* dummy */) {
    mDevice.Tick();
    mAdapter.GetInstance().ProcessEvents();
}

We still will want this to be blocking on the first frame due to the magenta flashing issue, but subsequently I think we can do it asynchronously.

Copy link
Contributor

@AndyHovingh AndyHovingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my suggested change comments, namely on not blocking the driver's commit call.

@jc3265 jc3265 force-pushed the jc/enableTQueries branch from 3d77b23 to 04ff88f Compare June 17, 2025 15:42
[=](wgpu::QueueWorkDoneStatus status) {
if (status == wgpu::QueueWorkDoneStatus::Success) {
if (mTimerQuery) {
mTimerQuery->endTimeElapsedQuery();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our conversation offline, I think this may not be accurate enough for our purposes, as it depends on when we call ProcessEvents (the driver's tick call). Unfortunately, I think we may need to take a different approach of adding up render/compute pass times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal Issue/PR does not affect clients webgpu Issues/features for WebGPU backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants