-
Notifications
You must be signed in to change notification settings - Fork 2k
webgpu: Implement TimerQueries #8880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
faab06b
to
3d77b23
Compare
class WebGPUTimerQueries : public HwTimerQuery { | ||
public: | ||
WebGPUTimerQueries() | ||
: status(std::make_shared<Status>()) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend using unitary/list initializers (curly braces), as opposed to the parentheses syntax whenever you can, as it can avoid certain narrowing conversion issues and is more consistent syntax across the language, e.g.:
WebGPUTimerQuery()
: mStatus{ std::make_shared<Status>() } {}
|
||
void beginTimeElapsedQuery(); | ||
void endTimeElapsedQuery(); | ||
bool getQueryResult(uint64_t* outElapsedTimeNanoseconds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Time
in the name is redundant right? Perhaps uint64_t* outElapsedNanoseconds
?
std::atomic<uint64_t> previousElapsed{ 0 }; | ||
}; | ||
|
||
std::shared_ptr<Status> status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the Status
struct or shared_ptr
really needed? could we not just have?:
std::atomic<uint64_t> mElapsedNanoseconds{ 0 };
std::atomic<uint64_t> mPreviousElapsed{ 0 };
Or do we need to ensure these 2 get updated together, in which case, something like?:
struct Status {
uint64_t elapsedNanoseconds{ 0 };
uint64_t previousElapsed{ 0 };
}
// and...
std::atomic<Status> mStatus{ {} };
|
||
void beginTimeElapsedQuery(); | ||
void endTimeElapsedQuery(); | ||
bool getQueryResult(uint64_t* outElapsedTimeNanoseconds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend [[nodiscard]]
namespace filament::backend { | ||
|
||
void WebGPUTimerQueries::beginTimeElapsedQuery() { | ||
status->elapsedNanoseconds = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we lock status
below and not here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk if its needed since we are resetting the value to 0
but i pretty much just got it from https://github.com/google/filament/blob/main/filament/backend/src/metal/MetalTimerQuery.mm#L29
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But Metal's implementation is a bit different than this. It is setting some state and then later interacting with that state in a callback (fence->onSignal(...) {...});
). In that callback the state is being locked. That code is being defensive about the scenario that the MetalTimerQueryFence
is destroyed before the callback is invoked. However, in this WebGPU implementation we are just implementing everything synchronously... the owning class should not be disappearing mid-function call. Thus, I don't think the Metal example is applicable here. It seems to me this could be implemented something like:
void WebGPUTimerQueries::beginTimeElapsedQuery() {
status->elapsedNanoseconds = std::chrono::steady_clock::now().time_since_epoch().count();
}
std::weak_ptr<WebGPUTimerQueries::Status> statusPtr = status; | ||
|
||
if (auto s = statusPtr.lock()) { | ||
s->elapsedNanoseconds = std::chrono::steady_clock::now().time_since_epoch().count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of elapsedNanonseconds
becomes confusing at this point, as it is not "elapsed" since a meaningful point in time, but rather more of a timestamp of when the query began right? In that case, should this be more like beganNanoseconds
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes and no, because then at endTimeElapsedQuery
we would have something like
elapsed = current - began
so its just for simplicity and to avoid using an extra variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we not have the same number of variables by just having beganNanoseconds
and elapsedNanoseconds
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we currently only have elapsed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those 2 numbers not sufficient to implement your interface?
void WebGPUTimerQueries::endTimeElapsedQuery() { | ||
// Capture the timer query status via a weak_ptr because the WGPUTimerQuery could be destroyed | ||
// before the block executes. | ||
if (status->elapsedNanoseconds != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, why is accessing status here fine and not below?
if (status->elapsedNanoseconds != 0) { | ||
std::weak_ptr<WebGPUTimerQueries::Status> statusPtr = status; | ||
if (auto s = statusPtr.lock()) { | ||
s->previousElapsed = s->elapsedNanoseconds = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meaning of variables seem to change meaning. Now elapsedNanoseconds
does mean elapsed nanoseconds since beginning the query, but before it was since epoch (more of a timestamp). And why not include Nanoseconds
in the name of previous...
? I would also suggest breaking the statement into separate assignments for readability.
} | ||
if (outElapsedTime) { | ||
*outElapsedTime = status->previousElapsed; | ||
status->previousElapsed = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of this function is misleading, because getX()
typically indicates a getter, which is not expected to have side effects or be computationally expensive in any reasonable way. However, this has side effects; thus, I would indicate that in the function name somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its consistent with the existing Filament implementations
https://github.com/google/filament/blob/main/filament/backend/src/metal/MetalTimerQuery.mm#L59
I believe we should leave it as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the same argument about the Metal implementation name. Why not getQueryResultAndReset
?
@@ -268,7 +266,29 @@ Handle<HwFence> WebGPUDriver::createFenceS() noexcept { | |||
} | |||
|
|||
Handle<HwTimerQuery> WebGPUDriver::createTimerQueryS() noexcept { | |||
return Handle<HwTimerQuery>((Handle<HwTimerQuery>::HandleId) mNextFakeHandle++); | |||
return allocAndConstructHandle<WebGPUTimerQueries, HwTimerQuery>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just follow the pattern as the other resources? Just allocate the handle here and construct it in the createTimerQueryR function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
following the implementation that the other drivers had
|
||
void WebGPUDriver::createTimerQueryR(Handle<HwTimerQuery> timerQueryHandle, int) {} | ||
|
||
void WebGPUDriver::destroyTimerQuery(Handle<HwTimerQuery> timerQueryHandle) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think moving this down here makes navigating the functions even more "difficult", because we are not consistent about it. I suggest rearranging them all at once in a PR to keep things consistent and easy to find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also just do it little by little
I dont think that it makes navigation more difficult, specially for Timer Queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I put double quotes around "difficult", because it is not particularly so, but I am just pointing out that I'm not a fan of the inconsistency.
} | ||
} | ||
|
||
TimerQueryResult WebGPUDriver::getTimerQueryValue(Handle<HwTimerQuery> timerQueryHandle, uint64_t* elapsedTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename elapsedTime
to outElapsedTime
to indicate it as an output parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elapsedTime
matches DriverApi.inc
DECL_DRIVER_API_SYNCHRONOUS_N(backend::TimerQueryResult, getTimerQueryValue, backend::TimerQueryHandle, query, uint64_t*, elapsedTime)```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We routinely use slightly different names for added clarity. For example, in the same signature we use timerQueryHandle
instead of query
from DriverApi.inc.
} | ||
} | ||
|
||
TimerQueryResult WebGPUDriver::getTimerQueryValue(Handle<HwTimerQuery> timerQueryHandle, uint64_t* elapsedTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would make the pointer const
(but, not what it points to, as it is an output param), e.g. uint64_t* const outElapsedTime
.
} | ||
} | ||
|
||
bool WebGPUTimerQueries::getQueryResult(uint64_t* outElapsedTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would make the pointer const (but, not what it points to, as it is an output param), e.g. uint64_t* const outElapsedTime
|
||
TimerQueryResult WebGPUDriver::getTimerQueryValue(Handle<HwTimerQuery> timerQueryHandle, uint64_t* elapsedTime) { | ||
auto* tq = handleCast<WebGPUTimerQueries>(timerQueryHandle); | ||
return tq->getQueryResult(elapsedTime) ? TimerQueryResult::AVAILABLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have the WebGPUTimerQuery
's function return a TimerQueryResult
instead of a bool
?
- this ignores the error case. We should probably be returning
TimerQueryResult::ERROR
to Filament in that case right (instead of reporting to Filament NOT_READY in the case of an error)?
mTimerQuery->endTimeElapsedQuery(); | ||
} | ||
} | ||
}); | ||
const wgpu::Instance instance = mAdapter.GetInstance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these lines indented over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like Clion thought that they should be
I can either leave the automatic formatting or manually change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried reformat with CLion and it moves these lines back to the left where they belong, e.g.:
auto f = mQueue.OnSubmittedWorkDone(...
const wgpu::Instance instance = mAdapter.GetInstance();
auto wStatus = ...
...
if (firstRender) { | ||
auto f = mQueue.OnSubmittedWorkDone(wgpu::CallbackMode::WaitAnyOnly, | ||
[=](wgpu::QueueWorkDoneStatus) {}); | ||
auto f = mQueue.OnSubmittedWorkDone(wgpu::CallbackMode::WaitAnyOnly, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can avoid having this be blocking on EACH frame by setting the callbackMode
here to wgpu::CallbackMode::AllowSpontaneous
and updating WebGPUDriver::tick
to invoke the instance's ProcessEvents()
:
void WebGPUDriver::tick(const int /* dummy */) {
mDevice.Tick();
mAdapter.GetInstance().ProcessEvents();
}
We still will want this to be blocking on the first frame due to the magenta flashing issue, but subsequently I think we can do it asynchronously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my suggested change comments, namely on not blocking the driver's commit call.
3d77b23
to
04ff88f
Compare
[=](wgpu::QueueWorkDoneStatus status) { | ||
if (status == wgpu::QueueWorkDoneStatus::Success) { | ||
if (mTimerQuery) { | ||
mTimerQuery->endTimeElapsedQuery(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our conversation offline, I think this may not be accurate enough for our purposes, as it depends on when we call ProcessEvents (the driver's tick call). Unfortunately, I think we may need to take a different approach of adding up render/compute pass times.
BUGS=[407961733]