Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 228 additions & 0 deletions proposals/xxxx-experimental-dxil.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
---
title: "XXXX - Experimental DXIL"
params:
authors:
- V-FEXrt: Ashley Coleman
- llvm-beanz: Chris Bieneman
- tex3d: Tex Riddell
sponsors:
- V-FEXrt: Ashley Coleman
status: Under Consideration
---

* Planned Version: SM 6.10

## Introduction

This proposal introduces a method for denoting and tracking experimental dxil
operations that minimize churn when an operation is rejected or delayed to a
later DXIL version.

## Motivation

During iterative development of the shader compiler it is beneficial to
implement real lowering into real opcodes to validate that a proposal actually
solves real world use cases. Traditionally this has been done by marking all in
development opcodes as expirmental opcodes of the next dxil version release.
Comment on lines +25 to +26
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has never been a way of "marking" opcodes as experimental. They are simply added at the end of the most recent released DXIL opcodes. We have an enum that defines the number of released opcodes for an indicator of which set is frozen for which DXIL version.

Suggested change
solves real world use cases. Traditionally this has been done by marking all in
development opcodes as expirmental opcodes of the next dxil version release.
solves real world use cases. Traditionally this has been done by adding new
opcodes right after the last released opcode in the prior DXIL version.

In most cases this is sufficient as the opcodes are accepted into the next
release but challenges arise when one or more opcodes are rejected or delayed
from the release while the opcodes following them are not. In the rejection
case the opcodes must be burned and in the delayed case the opcodes must be
manually moved to the next release with special casing code. This proposal
seaks to implemented a systematic method to handle these issues.
Comment on lines +27 to +32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has actually been problematic for many releases. Each of these issues has been a PITA for many releases:

  • opcodes can't be changed without either a breaking change across in-development driver, compiler, and test scenarios, or burning opcodes, turning them into "reserved".
  • The intention is that unused opcodes are removed, and the final opcodes for the next shader model are compacted into the block of final opcodes. This creates a breaking change line again, across compiler version, drivers, and any compiled code.
  • Independent feature development requires difficult synchronization since all ops must be serialized into one compact list of enumerated ops.
  • There's no way to bridge the transition between preview and final versions.

We never had the opportunity to clean up our ops for SM 6.9 before we found out we couldn't change them. This left multiple ranges of "reserved" DXIL ops behind that are now part of DXIL 1.9, 28 ops in total!

Suggested change
In most cases this is sufficient as the opcodes are accepted into the next
release but challenges arise when one or more opcodes are rejected or delayed
from the release while the opcodes following them are not. In the rejection
case the opcodes must be burned and in the delayed case the opcodes must be
manually moved to the next release with special casing code. This proposal
seaks to implemented a systematic method to handle these issues.
In some cases this is sufficient, when feature development is unified, opcodes
don't change after being added, and all opcodes in a contiguous block starting
from the prior release are accepted into the next release.
But challenges arise during parallel feature development, from experimental
feature evolution requiring opcode changes, or when a feature and its opcodes
are excluded from the release while the opcodes following them are not.
Excluded opcodes must either be turned into reserved opcodes or a breaking DXIL
change must be synchronized between the compiler, tests, and drivers.
This proposal seeks to implement a systematic method to handle these issues.


## Goals
This proposal seeks to address the following points:
* Needless churn when experimental op are delayed or rejected
* Expermental op boundary point is rigid and moves with every SM update
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Expermental op boundary point is rigid and moves with every SM update
* Experimental feature boundaries are rigid and unaffected by SM updates

* Long term experiments (and potentially extensions) aren't currently feasible
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Long term experiments (and potentially extensions) aren't currently feasible
* Enable long term experiments and potentially extensions

* Focused on core api system (hlsl instrinsics and DXIL ops)
* Works within the current intrinsics/DXIL op mechanisms
* Minimizes overall changes to the system and IHV processes
* Straightforward transition route from experimental to stable
* IHV drivers can support both experimental and stable versions of an op simultaneously
* simplifies migrations
Comment on lines +43 to +44
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* IHV drivers can support both experimental and stable versions of an op simultaneously
* simplifies migrations
* Soft transitions between versions of experimental ops and final ops simplify migrations
* IHV drivers can support multiple experimental versions and the final version of a set of ops in the same driver


## Non-goals
Future proposals may address the topics below but this proposal seeks to be a
smaller isloated change. It intends to solves immediate term challenges
without investing significant engineering efforts into a generalized solution.
That said, an attempt is made to avoid proposals that preclude a generalized
solution. Thus this proposal explicitly avoids addressing these issues:
* Full scale generalized extension system
* Process development to enable asynchronous non-colliding development
* Metadata/RDAT/PSV0/Custom lowering are out of scope for this document



## Potential DXIL Op Solutions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even before we list potential solutions, we need to call out DXC and DXIL's deep dependency on a fully developed system for managing DXIL operations which any final operations would need to use. Any divergence from this system will carry a significant design, implementation, and transition burden. I believe solutions we seriously consider should be limited to ones that minimize these burdens in various ways. In order to do this, we need a general outline of this system so proposed solutions can address how certain parts may be modified to achieve the goals.

In addition, at the HLSL level, we may need some solution for HLSL Intrinsic HLOpCodes. These are different from DXIL OpCodes, and not part of the DXIL contract with the driver, but these HLOpCodes are an enumerated set that also impacts IR tests.

Something like this:

Suggested change
## Potential DXIL Op Solutions
## Existing DXIL Op and HLSL Intrinsic Infrastructure
In DXC, there exists a large amount of infrastructure for handling DXIL ops as special types of functions throughout the compiler. From definition to lowering to passes to validation and consumption, any solution that doesn't fit into this system will face significant challenges from development through to the transition of operations from experimental to final official DXIL ops in a shader model, both in the compiler and in drivers consuming the ops.
There is also a high-level intrinsic system which uses its own set of opcodes in the generated enum: `hlsl::IntrinsicOp`. Though these are internal to the DXC compilation pipeline, stability of these opcodes impacts any tests with high-level IR, such as tests for lowering.
This section outlines key areas of this system for clarity in reasoning about solutions.
### DXIL Op Definitions
DXIL Ops are defined in `hctdb.py`, which is used by `hctdb_instrhelp.py` to generate header and cpp code used directly by drivers to consume the operations, as well as generate a variety of other code for the compiler, validator, DXIL spec, etc...
`DxilOperations.h/cpp` implements the core of the system for handling DXIL operations in a DxilModule.
DXIL OpCodes, which are always passed as a literal in the first argument of a DXIL operation call, are a contiguous set of values starting at 0, such that they may be used to directly index a table of opcode definitions at the core of this infrastructure. This OpCode argument in the DXIL Op call is the sole identifier of the operation being called. Function names reflect OpCodeClass and overloads, but this is only a means to prevent collisions between functions used by operations requiring different signatures and attributes.
The contiguous nature of DXIL OpCodes used to index into a table is the first key hurdle in defining experimental ops. If an operation at a particular index is changed in any significant way, the interpretation of IR across that change boundary produces undefined behavior (crash if you're lucky), with no automatic mechanism to guard against this.
### HLSL IntrinsicOp definitions
Intrinsic operations are normally defined in `gen_intrin_main.txt`, which is parsed by `hctdb.py` and used by `hctdb_instrhelp.py` to generate the `hlsl::IntrinsicOp` enum, and a bunch of tables used by custom intrinsic overload handling code in `Sema.cpp`.
There is infrastructure that tracks previously assigned HL op indices by intrinsic name in `hlsl_intrinsic_opcodes.json`. This can be a merge conflict point between any parallel feature development.
While indices are separated between functions and methods, all functions or all methods with the same name will share the same HL opcode. Generally this isn't a problem as the arguments (which would include an object) allow you to differentiate things when handling opcode calls. Recently a `class_prefix` attribute was added to the intrinsic definition syntax for `gen_intrin_main.txt` to prepend a class name, used for `DxHitObject`. This is just an example of how this system can be extended to separate out ops if necessary.
`HLOperationLower.cpp` uses a direct table lookup from the (unsigned) `IntrinsicOp` value to the lowering function and arguments. This creates another merge point for any experimental features (and potentially extensions), which integrate into the same intrinsic table.
There is an extension mechanism defined through the `IDxcLangExtensions` interface on the DXC compiler API object. It allows you to define a separate intrinsic table with predefined lowering strategies to produce extended ops as external function calls outside the recognized DXIL operations. It's meant to enable target extensions (extra intrinsics within certain limited definitional bounds) in HLSL for a custom backend. Modules using extensions wouldn't be accepted by the DXIL validator (unmodified). The way extensions must be defined, used, and interpreted differs significantly from adding built-in HLSL intrinsics and DXIL operations, which means it will introduce significant burdens and limitations to initial op definitions, lowering and compiler interaction, and make the transition to final DXIL operations painful. For these reasons, I don't think we should consider this extension mechanism as part of our solution at this time.
While this document focuses on a solution for DXIL ops, the HL opcodes can lead to difficult conflicts between independent feature development branches as well. Avoiding these requires synchronizing `hlsl_intrinsic_opcodes.json` and pre-allocated lowering table entries in `HLOperationLower.cpp` in a common branch as a very first step whenever adding any new HLSL intrinsics.
### IR Tests
Tests that contain DXIL, will have DXIL operation calls passing a literal `i32` OpCode value in as the first argument. If these opcodes are to change between experimental and final versions, there should be an easy way to update the tests accordingly. Same for any high-level IR for the IntrinsicOp numbers.
There are two places where hard-coded numbers appear in tests: source IR and FileCheck statements for checking output IR.
There isn't any known solution that doesn't involve a change to at least the DXIL OpCodes when transitioning from experimental to final DXIL ops.
That requires either updating these across all tests (potentially with scripted regex replacement - matching could be error-prone) or adding some tool (or tool option) to translate symbolic opcodes to literal numbers as a first step.
### Summary of key elements a solution should address
- DXIL Op property table indexed by OpCode
- HLOperationLower table indexed by IntrinsicOp
- A way to update and deprecate experimental opcodes during development without a new opcode overlapping an old one, leading to undefined behavior in a driver if mismatched IR is used.
- A way for the same driver to accept multiple versions of ops without undefined behavior.
- A way to easily transition tests from experimental ops to final DXIL ops
- Potentially: A way to avoid some of the more difficult HL opcode conflicts between independent feature development branches
- Minimal, or ideally no, changes required to source code interacting with or consuming DXIL ops when transitioning from experimental to final ops.
## Potential DXIL Op Solutions


### Top 1 bit as "is experimental" flag

The top bit of all opcodes is a flag stating if the opcode is experimental.

No structural or shape changes to the DXIL occur, simply the fact that the opcode
has the high bit set informs that it is experimental. This makes it very easy
for the compiler and drivers to detect experimental opcodes. When an opcode is
transistioned to stable the opcode needs to be assigned a stable number.
This splits the 4 billion opcode space into two 2 billion partions. One for
stable one for experimental.

This is the simpliest proposal with the least invasive set of changes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is as simple as expressed here. You can't just set a high bit without implications throughout the DXIL Op infrastructure. The OpCode is used as a direct lookup index into an array of operation properties.

You haven't specified if the opcode besides the high bit would continue to increment from the last official opcode, or would be a separate numeric space starting at zero, or something else. If it continues to increment from prior operations, it has all the problems we have today, and doesn't really solve anything. The problem we are trying to solve isn't just about identifying whether a DXIL operation is an experimental operation or not, that would be easy without messing with the OpCode.

You still have to solve the problem of remapping the DXIL Op lookups by OpCode, which requires enough changes to put this solution about on par in complexity with a solution that partitions the OpCode and allows for the high part to identify a table to use, allowing for N tables rather than just one more hardcoded table.


Pros:
* Very simple
* Quick to implement
* Could be implemented "by hand" today by hard coding opcodes
Cons:
* Not a solution for extensions
* transistion from experimental to stable isn't just unsetting the bit
* other stable ops may have already taken that number
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line appears to imply that the experimental opcode space isn't independent, because if it was, stable ops would always be in collision as a matter of design.

Basically, this is a con of any possible solution, because the only way you have no-op experimental->stable mapping in IR is if you do it the way it's done now, and just put the correct final ops in for the next shader model from the start.

That said, I do have a solution that requires no changes to the compiler code that interacts with the ops that ultimately move to final (lowering code etc.), but IR tests still need fixing up somehow.

* complicates the experimental->stable mapping

### Top 8 bits as "opcode partition" value
This is pretty much identical to the 1 bit flag proposal except there are 256
partitions with 16 million opcodes each. The key difference is that it unlocks
extension potential as extension developers such as IHVs could reserve a
partition for their own use without collision with other opcodes.

| Partition | Use |
|-----------|-----|
| 0 | stable |
| 1 | experimental |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe different experimental features should be separated into different groups, just like different extension features would be. This makes it easier to deprecate one (whether because it was accepted or rejected) while carrying others along independently. Otherwise, you have to keep carrying along all the experimental opcodes you ever made, or replace them with reserved ranges. If they are separate, after an experimental feature is deprecated, it can be removed and even re-used for a new experimental feature or even an extension in the future. The set of known experimental IDs for a given shader model may be fixed, but the next shader model is free to remove the deprecated features.

Separate tables also makes it easy to isolate additions to ops in different feature tracks without synchronization, which solves a pain point from releases going back a while.

Also note that each partition is independent, dense, and starts at zero, so that it may be used as a simple table lookup index, just as OpCode is now.

You'd have an opcode partition table with entries pointing to opcode tables (or blank for vacated slots). This scheme can actually be implemented relatively easily in DxilOperations.h/cpp.

| 2 | extension foo |
| .. | extension .. |
| 255 | extension 255 |


Pros:
* Fairly simple
* Quick to implement
* Enables basic opcodes extension system
Cons:
* transistion from experimental to stable isn't just clearing the partition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, I don't think there is any solution possible that could avoid this con.

Even then, "just clearing the partition" is changing the opcode, thus the IR, and still requires test updates to deal with. So, what's the difference between a hypothetical solution that avoids this con and one that has the con?

Are you imagining a hypothetical solution that I'm not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the "encode into the opcode" solutions avoid this con, but some of the more "robust" solutions actually can. Specifically, ### Single Specific Experimental Opcode with varargs where there is never a "transition" period. All opcodes for the rest of time are "extension" opcodes and the lowering doesn't change. Then we maintain a list somewhere else that determines which extension sets are experimental and which are not and add infrastructure around that to ensure non-experimental sets aren't allowed in non-experimental builds

But as you said, that solution is a non-starter. Included this as a con was to compare against that and call out that a solution chosen with this con will need to resolve this issue in one manner or another

* other stable ops may have already taken that number
* complicates the experimental->stable mapping

### Top 16 bits as "opcode partition" value
Identical concept as above but with 64k partitions, each with 64k opcodes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be my preferred solution. I already know how it can be implemented relatively easily.

I can't see any reason to need more than 64k core DXIL ops. With the existing system, that would be very unwieldy, since all ops are described in a big table in one .cpp file. In any case, we could always extend these with another table entry if necessary.

This gives you more partition space for the feature separation I suggested earlier.

Also, more space for table/partition ID makes it easier to partition that space into several sub-spaces, which could be a useful expansion in the future (like: Core/Experimental, Official Extensions, IHV Extensions, Private Extensions). This would keep IR and code compatibility between all compiler branches at different organizations with different feature sets, while avoiding synchronization and disruption across organizations for shifting table (partition) IDs.

I also think we could create a registry for extended table usage in the DXIL module encoded in metadata. Basically, some op table indices and corresponding names. The DXIL disassembler could disassemble this to a comment which can easily be CHECKed with FileCheck. It will be disallowed in a release-hashed DXIL module. We could change that to allow official extension tables in the future.

So, I think we should partition this with 16-bits for table (partition) ID and 16-bits for DXIL OpCode.


### Split the opcode in half
Lower 16 bits are the core/stable opcodes, Upper 16 bits are the experimental opcodes.

Gives 64k opcodes for stable then the upper 64k can either be chunked manually
leaving all number available for opcodes or it can be partitioned as 256
chunks of 256 opcodes with the partition encoded into the opcode itself

Very similar concept as before but keeping track of opcodes is complicated.
Also enables a weird situation where two opcodes "could" be encoded into a
single value.
Comment on lines +118 to +119
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that the lower OpCode bits would be left as 0 until a stable opcode is assigned, at which point, you'd assign the low bits and keep the high bits as a reference to the previously used experimental opcode for some time? For how long, and why?

Would the DxilOperation system and validator just be expected to strip off the high bits when interpreting DXIL OpCodes from now on because they may reference some old experimental OpCode we may no longer even have in the compiler? Would we consider non-zero high bits to be allowed in validated DXIL ops for a released shader model?

Would you expect to strip those bits eventually from the opcodes the compiler produces? If so, that's another disruption requiring IR test updates, just like at the point of final OpCode assignment.

I don't see what benefit this has over just copying the final DIXL Ops into a serialized set of final opcodes for the target.


### Introduce dx.opx.opcodeclass for experimental/extended ops
Denotes the experimental status in the actual opcode. Potentially doubles the
opcode space depending on implementation however it doesn't make the transistion
to stable any easier and complicated the integration with the current intrinsics
system.

Pros:
* Enables fairly robust extension system
* Doesn't consume large portions of the current opcode space
* obvious from reading the DXIL that experimental/extension is being used
Cons:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would cause driver issues since drivers don't need to look at the class

* transistion from experimental to stable isn't just dropping the `x`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one actually makes the transition significantly harder than the other options. For the other options, you could use a regex search/replace to update opcodes (with some risk of missing some cases). In theory, you could even add a symbolic opcode replacer to avoid having to ever update the tests.

But with this approach, the instruction stream is changed in a way that can't easily be updated with text search/replace, or maped through a symbolic opcode replacement utility.

* other stable ops may have already taken that number
* complicates the experimental->stable mapping
* Not well integrated into the current system, would require notable dev work
* Unclear how to allocate extension vs experimental ops in the opx space

### Extension/Experimental Feature Opcode
Relaxing the restriction that DXIL opcodes are immediate constants would allow
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly how much we would break with this proposed, but I'm pretty sure it would be a lot. This would be a hard adjustment to make. It complicates DXIL op construction, as you'll need to lookup or create a special op in the entry block just to provide the OpCode.

DXIL op property lookup by OpCode happens a lot for DXIL operations, and this would complicate and increase overhead for that operation.

a call that returns a value representing a special operation. The call creates
the value from a feature ID and feature local opcode. Unique-ify information
could be stored in the call directory or in metadata.

```llvm
%feature_id = i32 123
%cool_operation = i32 456
%opcode = i32 dx.create.extensionop(%feature_id, %cool_operaton)
%result = i32 dx.op.binary(%opcode, %a, %b)
```

Pros:
* Enables vary robust extension system
* Doesn't consume any of the current opcode space
* Obvious from reading the DXIL that experimental/extension is being used
Cons:
* Transistion from experimental to stable is non trivial. [See here](####stabilizing-with-opcode-subsets)
* Not integrated into the current system, would require notable dev work
* Breaks a pretty fundamental DXIL assumption

### Single Specific Experimental Opcode with varargs
A new opcode class `dx.op.extension` is introduced as a core stable opcode in
which named opcode subsets can be called directly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a non-starter, given the inability to use any of the DXIL op infrastructure for experimental ops, then requiring re-implementation for final ops.

Additionally, we need separate function defs in llvm for different attributes, so we'd at least have to have overload-like permutations keyed off attributes, which the DXIL op system also doesn't do.



```llvm
%opcode_set = str "My Cool Experiment"
%opcode = i32 123
%res = i32 dx.op.extension(i32 12345, %opcode_set, %opcode, operands...)
```

The opcode set name and specific opcode are just arbitrary values from other
parts of the compiled shader.

Pros:
* Doesn't consume any of the current opcode space
* Obvious from reading the DXIL that experimental/extension is being used
* Very flexible
* Maintains first args as immediate constant
* All the information is encoded in the call
Cons:
* Transistion from experimental to stable is non trivial. [See here](####stabilizing-with-opcode-subsets)
* Unclear how well the current system will handle varargs
* More complex to implement and integrate
* `dx.op.extension` will need to support any arbitrary overload

#### Stabilizing with opcode subsets
Some proposals in this doc create new opcodes sets that reuse existing numbers
nested under a set name or feature id. These proposals have a more complex route
for transistioning from experimental to stable. There are two potential routes
to be considered.

* Create a new stable opcode from scratch using the normal mechanisms that
currently exist then migrate lowering paths to use it
* Maintain a notion of experimental and non-experimental opcode subsets then
update the specific subset to no longer be considered experimental keeping
all lowering the same

The first option has a larger churn burden but maintains the status quo and keeps
the generated code relatively dense while the second option is likely the easiest
transistion system from any proposal in this document at the cost of code density
and introducing a second way for stable operations to exist in DXIL.

## Potential HLSL Intrinsic Solutions
There are two types of intrinsic solutions that can be imagined. One where an
extension author provides external code that has a custom lowering to an
arbitrary extension DXIL op and one that is prebaked into the compiler and
conditional enabled/disabled as appropiate.

As HLSL intrinsics are more flexable and can be reordered/renamed without
burning some finite resource only the second type is being considered at the
moment. The first type falls under "general purpose extension system" which is
out of scope for this document.

Intrinsic functions should be handled in a reasonable way. Ideally this means
that an intrinsic is only available if the experimental/extension op is also
available. Likely this means updating gen_intrin_main to mark an intrinsic as
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the way to do this is with a shader model attribute on the intrinsic definition. This could continue to work, but would just require an update to the shader model requirement for experimental feature intrinsics when not being accepted into the next shader model. That's probably not a very big deal to manage actually.

Another approach would be to add a new attribute you can define on the intrinsic, like feature=FeatureName, which would tie the intrinsic to a feature, which could be marked experimental until it ships, at which point it would impact the minimum shader model requirement, just like the shader model attribute.

experimental/extension then generating code that errors if it used in a
non-experimental/non-extension environment. But that is subject to change based
on the DXIL solution chosen. Once a proposal is selected this section will be
updated to reflect that.

## Outstanding Questions

* Should DXC have some kind of --experimental flag that turns on/off
experimental intrinsics and DXIL ops?
* Related, when/how are experimental ops exposed in the compiler, when are they
errors to use?
* Should the validator warn on experimental op usage?