Alternative idea for benchmark isolation

Hello!

I saw the slides from the GTC presentation that mentioned this project and have an alternative idea that should™ solve most of the problems regarding cheating - but I have not yet checked, if there are other things like warmup, that might not be easily compatible.

Anyways, here is my idea:
In order to prevent cheating, it would be best, if the entire evaluation and measurement logic would reside in a different process. However, latency and jitter are major concerns in this case. I have recently worked with the [UMWAIT CPU instruction](https://www.felixcloutier.com/x86/umwait) (or MWAITX for AMD) for extremely low latency signaling and it should also work here.
A coordinator process is responsible to creating test inputs and measuring execution time. A child process (maybe further sandboxed) runs the actual GPU kernel. The synchronization happens via shared memory UMWAIT. A concrete flow could be:

1. The coordinator generates a random test input
2. The test input is written to the shared memory in a random order and potentially also writing some locations twice with updated content (to prevent the test process from using other locations for UMWAIT)
3. The coordinator writes to a specified trigger location, that the test process UMWAITs on. When this write is done the time is recorded as the start time. After that, the coordinator begins UMWAITing on the response trigger location
4. The test process directly starts the kernel execution via some callback (everything was set up in the test process beforehand)
5. After the kernel execution is done, the output is copied into another shared memory region and in the end a trigger write is done to the response location
6. The coordinator resumes from UMWAIT and immediately records the time again, then it uses a very fast hash (e.g. XXHash) to hash the response memory and then records the time again (this gives a small, defined time window when the result was produced). Then the output is copied to another location for analysis.
7. First, the hash is verified (this prevents the test process from triggering the coordinator wakeup first and then writing more data save a little bit of time)
8. The provided output is compared against the output of a reference implementation.

With the right setup and harness on both ends, this should provide a very low latency and jitter measurement of execution time (but it would include transfer to and from the GPU).

What do you think, would something along these lines be a viable way to improve the cheat resistance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative idea for benchmark isolation #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Alternative idea for benchmark isolation #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions