Skip to content

Redefine command-buffer simultaneous-use#1411

Merged
bashbaug merged 5 commits intoKhronosGroup:mainfrom
EwanC:rework_cmd-buf_sim-use
Aug 26, 2025
Merged

Redefine command-buffer simultaneous-use#1411
bashbaug merged 5 commits intoKhronosGroup:mainfrom
EwanC:rework_cmd-buf_sim-use

Conversation

@EwanC
Copy link
Contributor

@EwanC EwanC commented Jul 11, 2025

As discussed in #891 the current definition of simultaneous use is hard for users to reason about.

Instead a better model is one where simultaneous-use isn't a constraint on when a command-buffer can be submitted for execution, but the execution itself. That is, simultaneous-use occurs when a command-buffer is enqueued for execution while any previous submission of the command-buffer is still in-flight, and there any no scheduling dependencies expressed to serialize execution on the in-flight submission.

This means that pipelined submissions of a command-buffer, where there is an in-order queue, barrier, or cl_events to serialize execution, is always valid usage without the optional simultaneous-use device feature.

To avoid the runtime having to incur overheads from monitoring when simultaneous-use occurs so it can throw an error, violating this valid usage for non simultaneous-use command-buffers is UB. Following from this and relation to #1311, the pending state has also been removed from the command-buffer lifecycle, and there is only left the binary states of recording and executable (note that this conflicts with the change proposed in #1382, but it I believe it is straightforward to resolve). This is because a user can use the existing OpenCL mechanisms of host waits and event queries to inspect the state of individual command-buffer enqueues to avoid simultaneous-use, and having this stored as a command-buffer state incurs the runtime overhead of tracking a previous submissions.

The implications for updating a command-buffer is that the error behavior is removed from update, so simultaneous-use is a property of execution rather than enqueue. In fact update is the only scenario where simultaneous use now adds value, as without it there will be race conditions during execution. But with command-buffer update the command-buffer operations can work on different data. As a result i've moved the simultaneous capability from the base extension to mutable-dispatch.

I've outlined the CTS changes that would be required for this change in KhronosGroup/OpenCL-CTS#2473 and drafted them in KhronosGroup/OpenCL-CTS#2477

Ewan Crawford added 2 commits August 4, 2025 14:21
As discussed in KhronosGroup#891
the current definition of [simultaneous
use](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#_command_buffers)
is hard for users to reason about.

Instead a better model is one simultaneous-use isn't a contraint of the
command-buffer submission, but it's execution. That is simultaneous-use
occurs when a command-buffer is enqueued for execution while any
previous submission of the command-buffer is still in-flight, and there
any no scheduling dependencies expressed to serialize execution.

This means that pipelined submissions of a command-buffer, where there
is an in-order queue, barrier, or cl_events to serialize execution, is
always valid usage without this optional simultaneous-use device
feature.

To avoid the runtime having to incur overheads from monitoring when
simultaneous-use occurs so it can throw an error, violating this
valid usage for non simultaneous-use command-buffers is UB.

Following from this related to KhronosGroup#1311
the pending state has also been removed from the command-buffer
lifecycle, and there is only the binary states of recording and
executable. This is because a user can use the existing OpenCL
mechanisms of host waits and event queries to inspect the state
of individual command-buffer enqueues to avoid simultaneous-use,
and having this stored as a command-buffer state incurs the runtime
overhead of tracking a previous submissions. The pending count
concept also now makes less sense.

The implications for updating a command-buffer is that the error
behavior is removed from update, so simultaneous-use is a property of
execution rather than enqueue. See CTS test ideas for how this could
work.

I think the CTS changes that are required for this as follows, but I could
create a separate CTS issue to track work once/if this PR merge. Or we
could try prototype the CTS changes to give confidence in the spec
change.
* Remove test for pending state query
* Either rework existing simulataneous use tests so that it tests the new definition,
  or delete them and create new more suitable tests without tech-debt
  from old definition.
* Add tests for pipelined submission of a command-buffer not created
with simultaneous-use. Ideally stressing indriect dependencies as well as direct
ones:
** in-order queue to express depdencies
** out-of-order queue with event dependencies to express depenencies
** barrier to express for dependencies
* Add cl_khr_command_buffer_mutable_dispatch tests for updating and
enqueueing pipelined submissions of a non-simultaneous use command-buffer with depdencies between each
enqueue, and only do a blocking wait at the end.
* Add cl_khr_command_buffer_mutable_dispatch tests for a
simultaneous-use command-buffer, where two invocations are scheduled
such that they can run concurrently, but the second invocation is
updated such that it uses different inputs/outputs to avoid race
conditions in the kernel.
@EwanC EwanC force-pushed the rework_cmd-buf_sim-use branch from 16e2cd3 to e45df2e Compare August 4, 2025 13:40
EwanC pushed a commit to EwanC/OpenCL-CTS that referenced this pull request Aug 6, 2025
Actions test plan from KhronosGroup#2473
to update CTS tests to reflect changes from cl_khr_command_buffer PR
KhronosGroup/OpenCL-Docs#1411
EwanC pushed a commit to EwanC/OpenCL-CTS that referenced this pull request Aug 6, 2025
Actions test plan from KhronosGroup#2473
to update CTS tests to reflect changes from cl_khr_command_buffer PR
KhronosGroup/OpenCL-Docs#1411

* Adds new test in`command_buffer_pipelined_enqueue.cpp` for multiple enqueues without
  blocking in-between, but serialized execution.
* Removed test for `CL_COMMAND_BUFFER_STATE_PENDING_KHR` state query.
* Remove negative test for `clEnqueueCommandBuffer` pending state error.
* Simplify `cl_khr_command_buffer` tests that stress simultaneous-use by testing
  multiple serialized enqueues of the same command-buffer, which doesn't
  now require the device simultaneous-use capability
EwanC pushed a commit to EwanC/OpenCL-CTS that referenced this pull request Aug 7, 2025
Actions test plan from KhronosGroup#2473
to update CTS tests to reflect changes from cl_khr_command_buffer PR
KhronosGroup/OpenCL-Docs#1411

* Adds new test in`command_buffer_pipelined_enqueue.cpp` for multiple enqueues without
  blocking in-between, but serialized execution.
* Removed test for `CL_COMMAND_BUFFER_STATE_PENDING_KHR` state query.
* Remove negative test for `clEnqueueCommandBuffer` pending state error.
* Simplify `cl_khr_command_buffer` tests that stress simultaneous-use by testing
  multiple serialized enqueues of the same command-buffer, which doesn't
  now require the device simultaneous-use capability
* Flip setting the simultaneous-use property on command-buffer creation in base
  class to off, and only enable if tests requests it.
* Rewrite mutable dispatch simultaneous test to test updating between
  pipelined enqueues, and updating the new definition of
simultaneous-use
EwanC pushed a commit to EwanC/OpenCL-CTS that referenced this pull request Aug 8, 2025
Actions test plan from KhronosGroup#2473
to update CTS tests to reflect changes from cl_khr_command_buffer PR
KhronosGroup/OpenCL-Docs#1411

* Adds new test in`command_buffer_pipelined_enqueue.cpp` for multiple enqueues without
  blocking in-between, but serialized execution.
* Removed test for `CL_COMMAND_BUFFER_STATE_PENDING_KHR` state query.
* Remove negative test for `clEnqueueCommandBuffer` pending state error.
* Simplify `cl_khr_command_buffer` tests that stress simultaneous-use by testing
  multiple serialized enqueues of the same command-buffer, which doesn't
  now require the device simultaneous-use capability
* Flip setting the simultaneous-use property on command-buffer creation in base
  class to off, and only enable if tests requests it.
* Rewrite mutable dispatch simultaneous test to test updating both
  pipelined enqueues, and updating the new definition of simultaneous-use
EwanC pushed a commit to EwanC/OpenCL-CTS that referenced this pull request Aug 8, 2025
Actions test plan from KhronosGroup#2473
to update CTS tests to reflect changes from cl_khr_command_buffer PR
KhronosGroup/OpenCL-Docs#1411

* Adds new test in`command_buffer_pipelined_enqueue.cpp` for multiple enqueues without
  blocking in-between, but serialized execution.
* Removed test for `CL_COMMAND_BUFFER_STATE_PENDING_KHR` state query.
* Remove negative test for `clEnqueueCommandBuffer` pending state error.
* Simplify `cl_khr_command_buffer` tests that stress simultaneous-use by testing
  multiple serialized enqueues of the same command-buffer, which doesn't
  now require the device imultaneous-use capability
* Remove simultaneous-use command-buffer creation in base
  class to off, and require tests do it themselves if they require it.
* Rewrite mutable dispatch simultaneous test to test updating both
  pipelined enqueues, and updating the new definition of simultaneous-use
@EwanC EwanC requested a review from bashbaug August 8, 2025 12:33
@EwanC
Copy link
Contributor Author

EwanC commented Aug 8, 2025

@bashbaug I've requested you re-review the PR, as since you approved it I've pushed a commit that moves the simultaneous-use capability from the base extension to cl_khr_command_buffer_mutable_dispatch. While updating the CTS tests I realized I couldn't think of a scenario without update where it adds value, as any concurrent execution of the command-buffer would probably be race condition. Whereas with mutable-dispatch you can update your kernel to new inputs/outputs to use that would allow two concurrent executions to operate on different data. We haven't discussed this in the group yet, so I can revert it if we think it's undesirable.

@bashbaug
Copy link
Contributor

bashbaug commented Aug 8, 2025

I've pushed a commit that moves the simultaneous-use capability from the base extension to cl_khr_command_buffer_mutable_dispatch. While updating the CTS tests I realized I couldn't think of a scenario without update where it adds value, as any concurrent execution of the command-buffer would probably be race condition.

Interesting. My first reaction was that this goes a bit too far, but after thinking about it a bit more, I'm starting to agree. Without mutable dispatch, every execution of the command buffer is necessarily reading from and writing to the same locations in memory, which is why "any concurrent execution of the command-buffer would probably be race condition".

Note that Vulkan has the command buffer simultaneous usage flag VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT, but Vulkan also allows modifying the command buffer via vkCmdBindDescriptorSets and vkCmdPushConstants to avoid race conditions.

Let's discuss a bit more in our call on Tuesday before merging this change, just to be sure.

CC @bcalidas @kpet @paulfradgley

@bashbaug
Copy link
Contributor

bashbaug commented Aug 9, 2025

One more thought: "simultaneous use" is going to apply to mutable dispatch, plus mutable memory commands (#1065), plus potentially any other future extensions that allow modifying a command buffer (e.g. #1279). So, practically speaking, we still may want to keep the "simultaneous use" concept and enums in the base-level extension and have the layered extensions refer to them vs. replicating everything into each of the layered extensions.

Even if we do this, I think it's worth an informative note cautioning against concurrent execution of a command buffer without modification due to the possibility of data races, since this wasn't immediately obvious to me.

@bashbaug
Copy link
Contributor

Discussed in the August 12th teleconference:

  • We are leaning towards moving the simultaneous use bit to the mutable dispatch extension as suggested. Note that the other extensions that enable modifying the command buffer layer on top of the mutable dispatch extension so they can reuse the same simultaneous use bit.
  • We are going to explore adding a capability to indicate that the command buffer many be modified while it is in use, separate from the capability to indicate that the command buffer may be re-executed while it is in use (after it has been updated).
  • We added "focused review" to the related CTS tests changes for simultaneous use so they will be ready we decide to merge the spec changes, see CTS PR #2477.

EwanC pushed a commit to EwanC/OpenCL-CTS that referenced this pull request Aug 21, 2025
Actions test plan from KhronosGroup#2473
to update CTS tests to reflect changes from cl_khr_command_buffer PR
KhronosGroup/OpenCL-Docs#1411

* Adds new test in`command_buffer_pipelined_enqueue.cpp` for multiple enqueues without
  blocking in-between, but serialized execution.
* Removed test for `CL_COMMAND_BUFFER_STATE_PENDING_KHR` state query.
* Remove negative test for `clEnqueueCommandBuffer` pending state error.
* Simplify `cl_khr_command_buffer` tests that stress simultaneous-use by testing
  multiple serialized enqueues of the same command-buffer, which doesn't
  now require the device imultaneous-use capability
* Remove simultaneous-use command-buffer creation in base
  class to off, and require tests do it themselves if they require it.
* Rewrite mutable dispatch simultaneous test to test updating both
  pipelined enqueues, and updating the new definition of simultaneous-use
Copy link
Contributor

@bashbaug bashbaug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging as discussed in the August 26th teleconference.

@bashbaug bashbaug merged commit 2e33e33 into KhronosGroup:main Aug 26, 2025
2 checks passed
EwanC pushed a commit to EwanC/OpenCL-Headers that referenced this pull request Aug 26, 2025
Use script to regenerate the headers based on changes
from OpenCL-Docs that updated cl.xml

* KhronosGroup/OpenCL-Docs#1411
* KhronosGroup/OpenCL-Docs#1435
* KhronosGroup/OpenCL-Docs#1408
bashbaug pushed a commit to KhronosGroup/OpenCL-Headers that referenced this pull request Aug 26, 2025
Use script to regenerate the headers based on changes
from OpenCL-Docs that updated cl.xml

* KhronosGroup/OpenCL-Docs#1411
* KhronosGroup/OpenCL-Docs#1435
* KhronosGroup/OpenCL-Docs#1408
EwanC pushed a commit to EwanC/OpenCL-CTS that referenced this pull request Sep 2, 2025
Actions test plan from KhronosGroup#2473
to update CTS tests to reflect changes from cl_khr_command_buffer PR
KhronosGroup/OpenCL-Docs#1411

* Adds new test in`command_buffer_pipelined_enqueue.cpp` for multiple enqueues without
  blocking in-between, but serialized execution.
* Removed test for `CL_COMMAND_BUFFER_STATE_PENDING_KHR` state query.
* Remove negative test for `clEnqueueCommandBuffer` pending state error.
* Simplify `cl_khr_command_buffer` tests that stress simultaneous-use by testing
  multiple serialized enqueues of the same command-buffer, which doesn't
  now require the device imultaneous-use capability
* Remove simultaneous-use command-buffer creation in base
  class to off, and require tests do it themselves if they require it.
* Rewrite mutable dispatch simultaneous test to test updating both
  pipelined enqueues, and updating the new definition of simultaneous-use
EwanC pushed a commit to EwanC/OpenCL-CTS that referenced this pull request Sep 8, 2025
Actions test plan from KhronosGroup#2473
to update CTS tests to reflect changes from cl_khr_command_buffer PR
KhronosGroup/OpenCL-Docs#1411

* Adds new test in`command_buffer_pipelined_enqueue.cpp` for multiple enqueues without
  blocking in-between, but serialized execution.
* Removed test for `CL_COMMAND_BUFFER_STATE_PENDING_KHR` state query.
* Remove negative test for `clEnqueueCommandBuffer` pending state error.
* Simplify `cl_khr_command_buffer` tests that stress simultaneous-use by testing
  multiple serialized enqueues of the same command-buffer, which doesn't
  now require the device imultaneous-use capability
* Remove simultaneous-use command-buffer creation in base
  class to off, and require tests do it themselves if they require it.
* Rewrite mutable dispatch simultaneous test to test updating both
  pipelined enqueues, and updating the new definition of simultaneous-use
bashbaug pushed a commit to KhronosGroup/OpenCL-CTS that referenced this pull request Oct 14, 2025
Actions test plan from
#2473 to update CTS
tests to reflect changes from cl_khr_command_buffer PR
KhronosGroup/OpenCL-Docs#1411

* Adds new test in`command_buffer_pipelined_enqueue.cpp` for multiple
enqueues without blocking in-between, but serialized execution.
* Removed test for `CL_COMMAND_BUFFER_STATE_PENDING_KHR` state query.
* Remove negative test for `clEnqueueCommandBuffer` pending state error.
* Simplify `cl_khr_command_buffer` tests that stress simultaneous-use by
testing multiple serialized enqueues of the same command-buffer, which
doesn't now require the device imultaneous-use capability
* Remove simultaneous-use command-buffer creation in base class to off,
and require tests do it themselves if they require it.
* Rewrite mutable dispatch simultaneous test to test updating both
pipelined enqueues, and updating the new definition of simultaneous-use

---------

Co-authored-by: Ewan Crawford <ewan@codeplay.com>
@franz franz mentioned this pull request Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cl_khr_command_buffer Relating to the command-buffer family of extension

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clarify the necessity of CL_COMMAND_BUFFER_STATE_KHR Revisit command-buffer simultaneous-use

4 participants