GSoC 2025: GPU-accelerated raster ops #2658
Replies: 15 comments
-
PreparationAnything before week 1
|
Beta Was this translation helpful? Give feedback.
-
Week 1: being sickAm sick this week, likely caught something last week in the (amazing) rustweek conference. Still made some progress:
|
Beta Was this translation helpful? Give feedback.
-
Week 2: cargo-gpu naga transpile vertical sliceDemoA vertical slice of using cargo gpu to compile rust-gpu shaders and transpile them to wgsl with naga, so they can be passed to wgpu. Based on a GameJam game of mine with just 4 shaders, small enough to allow quick iteration and API exploration. Progress
|
Beta Was this translation helpful? Give feedback.
-
Week 3: invert_gpuDemo
Progress
|
Beta Was this translation helpful? Give feedback.
-
Week 4:
|
Beta Was this translation helpful? Give feedback.
-
Week 5: refactoring
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Week 8: rust-gpu edition 2024Progress
|
Beta Was this translation helpful? Give feedback.
-
Week 9: clippy and miri with cargo-gpuProgress
|
Beta Was this translation helpful? Give feedback.
-
Week 10: prep graster-nodes for shadersProgress
|
Beta Was this translation helpful? Give feedback.
-
Week 11: first wgsl compilationDemo
Progress
|
Beta Was this translation helpful? Give feedback.
-
2 week breakNot advancing the week counter, just taking 2 weeks off |
Beta Was this translation helpful? Give feedback.
-
Week 12Demo: invert nodeThe invert node works again. The rest has non-functioning nodes, as the runtime does not yet support non-texture arguments. (The "Extract Executor" is copied from within the "Upload Texture" node) ![]() Demo: posterize nodeWe can now pass parameters to shaders, allowing nodes like posterize to work! Though any node using floating point numbers is still broken, since I had to switch them from f64 to f32* and neither graphite's UI or node graph seems to support that properly. ![]() *f64 on GPUs: Not only is it typically 64 times slower on consumer level GPUs, but WebGPU spec doesn't expose f64 either. The wgpu impl exposes a Progress
|
Beta Was this translation helpful? Give feedback.
-
Week 13DemoProgress
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
Graphite used not have any hardware acceleration and compose the final image entirely on the CPU,
but in the past year, the vello renderer was implemented to accelerate the image composition on the
GPU. However, vello is only well suited for composing vector elements and plain raster images. It
does not implement any operations for (post-)processing raster images, like adjusting hue,
saturation, blending images, a posterization effect, etc. Currently, these “raster ops” are still
evaluated on the CPU, given that Graphite runs within the Web Assembly (wasm) environment of
the browser, limited to a single thread processing one pixel at a time. This significantly slows editing
with any raster-based content being present, in practice 2-3 fps on a typical desktop system.
Benefit
I want to accelerate the processing of rasterization ops using the GPU to achieve acceptable levels of
performance.
Instead of porting the existing raster ops written in Rust to a shader language, I want to use the
rust-gpu shader compiler to compile the existing Rust-based raster ops into shaders. It allows
Graphite to continue using the existing CPU path if
the new WebGPU APIWebGL is not available, withoutduplicating the code across several languages.
Final Report
graphite_final_report.mp4
The video above compares the previous CPU-based, shown first, to the new GPU / shader based raster image manipulation. By moving the image around, the image adjustments are reevaluated constantly and we can observe how fast these reevaluations are performed. The CPU path is only able to emit a new frame every second, whereas the GPU shaders can evaluate the adjustments close to real-time. However, we can observe significant hitching on the GPU path due to us inefficiently reallocating GPU images every frame, which would require some further investigations.
The source image is a 720p PNG export of the isometric fountain image that has it's brightness and contrast adjusted, been posterized and finally has some color level adjustment been done. The top graph shown first uses CPU nodes to evaluate the adjustments, the bottom path uploads the image to the GPU and evaluates all the adjustments in GPU shaders.
How to make new shader nodes
Note that GPUs have limitations in the kind of operations they can do. Your node may only use the primitive types u32 or f32, you must not use u8, u16. u64, usize or f64 anywhere. Enums must be C-like enums (must not have values like Option) and must be
#[repr(u32)]
. Many of the "node::registry" types likePercentage
orAngle
have equivalent f32 variants namedPercentageF32
andAngleF32
, you must use those to not use f64. Shaders are#[no_std]
, so you must only use symbols fromcore
and replace all use statements fromstd
withcore
. Notably, this excludesVec
,Box
,Arc
and any derivative types using them. If your crate contains nodes that require symbols fromstd
, I recommend giving them their own module and excluding the entire module with#[cfg(feature = "std")]
. This way you don't need to feature gate individual use statements and functions, which is often error prone.To specify a node to be of some shader node type, add
shader_node(<node name>)
to the node macro. The shader system does allow for many kinds of shader nodes, though only implementedNone
andPerPixelAdjust
were implemented.None
marks the node as "not a shader node", but importantly adds the requiredstd
feature gates to the node implementation. See "graphite integration" on how the feature gate works.The
PerPixelAdjust
runs the node once per pixel, where it is passed the color from input images and must return the resulting color. This functions similar to theAdjust
trait, though it is not limited to that trait specifically. It differentiates between two kinds of node params: uniform params that are the same for each pixel and image params marked with#[gpu_image]
that are passed theColor
of some input image at that pixel's location.All uniform params must implement
BufferStruct
, which may be#[derive(BufferStruct)]
on any struct or enum. Enums additionally require#[repr(u32)]
and#[derive(num_enum::FromPrimitive, num_enum::IntoPrimitive)]
. Structs or enums only used within the function, but not passed as parameters, may have but don't necessarily need to be aBufferStruct
. (Small detail: anyBufferStruct::Buffer
must also have an alignment of 4, otherwise you may (not must) get padding issues within the generatedUniform
struct. Not sure if there's an easy way around that issue.)Image params take a
Raster<GPU>
image as an input on the node graph, but expect that parameter to accept an instance ofColor
and the function to returnColor
. Typically, you implement this by giving your node function aT: MyTrait
generic thatimpl Color for MyTrait
, define the type of all image params asT
and the return type also asT
. The node macro will generate a new "gpu" node that mirrors the original node, with the image params replaced byRaster<GPU>
(and a&WgpuExecutor
param appended). As well as codegen the associated shader entry point that loads the colors from the input images, calls the node function once for each pixel, and stores the returnedColor
to the output image.Currently, you may only have exactly one image param in a node function, due to some limitations in the shader runtime. I hope to clean that up soon.
The technical bits
We want to use the rust-gpu shader compiler to not have duplicate code between shader and cpu nodes. A typical shader compiler like glslc (or shaderc) takes input files written in the C-like glsl language and turns them into SPIRV, a binary intermediary representations (IR) for shaders. Think of it like Java or C# bytecode, that the graphics driver accepts and compiles down to the machine code needed for your graphics card. rust-gpu works quite similar, just that it's input language is ordinary rust and not some C-like custom language.
Unlike a typical shader compiler, rust-gpu is not a full compiler but merely a "codegen backend" for the rustc compiler. This allows rust-gpu to reuse all the tokenization, parsing, type system, borrow checking etc. of the standard rustc compiler, and thus parse all the contructs of the rust programming language. At the very end of the rustc compiler pipeline, the Middle IR (MIR) is passed into a codegen backend to generate machine code. This codegen backend is typically LLVM, but is replaceable by a dynamically loaded library, like the
rustc_codegen_spirv
crate of rust-gpu.Codegen backends are build is an unstable internal interface that can change on a whim. This has a few important implications:
rust-gpu edition 2024
With the release of rustc 1.85.0 the rust edition 2024 was stablilized, but also came with significant changes to the codegen backend interface. This proved to be a significant challenge, so while edition 2024 released in February, it took us until July to port to newer toolchain versions. This was a huge blocker for graphite, as they and the entire rust ecosystem had already ported over to edition 2024, which the older compiler could not support. We would have to wait for rust-gpu to support edition 2024, as backporting all of the crates to edition 2021 was not an option.
The porting has primarly been done by @eddyb in Rust-GPU/rust-gpu#249 while I was mostly testing the branch against various projects and debugging issues as they poped up. Notably, this update also required changing the "target specs", a set of json specification files the rustc compiler requires of every codegen backend.
cargo-gpu
When I first looked at cargo-gpu, it was a command line application to compile a shader crate to SPIRV, the shader IR. Notably, it did not care for what toolchain you used in your project and automated the entire process of setting up rust-gpu: It would download rust-gpu and the required toolchain, compile rust-gpu with that toolchain and cache the build, and finally compile the shaders using the selected rust-gpu and toolchain version.
However, it had a few issues that needed to be resolved before we could use it:
rustc_backend_spirv
dylibs Rust-GPU/cargo-gpu#69While integrating these needed changes, I effectively refactored the entire codebase in 12 PRs and 3.005++ 4.610-- lines.
wgsl transpiling with naga
SPIRV is an open standard for shader IR for exchanging shaders between various compilers and graphics APIs. It was originally invented for the OpenCL compute API, but has evolved into the shader IR of the Vulkan API, which is the main graphics API on both Linux and Android systems. Even Windows is replacing their DXIL with SPIRV, making it the primary cross-platform shader IR to use. It was even set to be the main shader IR for the WebGPU API, but some parties were strictly against using that open IR standard. And so we got wgsl, a new C-like shading language with a rust-like syntax that's not anything like rust itself. And as is typical of web, it is sent to the browser as source code, so we may see some platform differences between Firefox, Chrome and Safari.
But since rust-gpu emits SPIRV, how to we convert from SPIRV to wgsl? Firefox implements the WebGPU API with their
wgpu
crate, which has a shader transpiler callednaga
. It was primarily build to convert wgsl into the different output shader languages needed for Firefox to run on all platforms: SPIRV for Vulkan on Linux and Windows, MSL on Mac and HLSL for Windows. But it also supports SPIRV input and WGSL output, so we can chain the compilers to go from rust to SPIRV to wgsl. This isn't anything new, schell's renderling has been doing this quite successfully and has contributed many fixes to the SPIRV input module.But I didn't just want to setup wgsl transpiling for graphite, it would be much nicer if rust-gpu could handle that internally. This would also allow us to specialize the SPIRV we're emitting for naga, if necessary. So I got to implementing a
spirv-unknown-naga-wgsl
target, both of which need a bit more work before they can be merged:enum SpirvTargetEnv
containing all available targets Rust-GPU/rust-gpu#311spirv-unknown-naga-wgsl
target via naga Rust-GPU/rust-gpu#280To test this entire stack without having to integrate this directly into graphite, I choose to reuse a GameJam Game of mine called Colorbubble. It uses wgpu, hand-written wgsl and can be deployed to the web, just like graphite should. And with only a single hand-written wgsl file of 75 lines, it is the perfect small project to test the entire tech stack of cargo-gpu and naga-wgsl transpiling. The result can be found on the branch
cargo-gpu
, which can be compiled on stable and deployed to the web. I also build a vertical slice in week 3 to compile graphite's invert node into a shader, as it's the simplest node without any parameters.Graphite integration
The proper integration into graphite was more difficult than initially expected due to rust-gpu's
#[no-std]
requirement. From rustc's perspective, rust-gpu is essentially an embedded target that does not have access to the standard library or a global memory allocator, thus it requires every crate to beno-std
(and not need thealloc
crate either). This makes symbols likeVec
,Box
,Arc
unavailable, since they require a global allocator. If you still want to use some synbol from std if it is available, you'd have to mask it out with a feature, typically thestd
feature on libraries. But correctly masking out std symbols is a quite error prone process, and it is quite easy to accidentally import fromstd
instead ofcore
.I was told that graphite supported a no-std compile without the
std
feature, from a previous but quite different rust-gpu integration attempt. But alas, that support has withered away, with almost nothing being masked out correctly. So I decided to pursue a different approach to no-std compatibility: Instead of masking out all the std symbols, we instead have a separateshader
crate that is (almost) completely no-std and reexported in the std crate. This allows me to move certain no-std symbols from the std crate to the shader crate, without breaking the paths of upstream crates. And since theshader
crate is no-std by default, you will notice immediately if something isn't masked correctly.The original plan was to split out code from the
gcore
crate into a variety of smaller crates, leavinggcore
with just no-std types. I've extractedgapplication_io
,gpath_bool
,gsvg_renderer
,gmath_nodes
,gelement-nodes
,graster-nodes
,gbrush
, andgtext
(never PRed), but it become obvious that this is a monumental task that I could not finish within the time frame given. But withgraster-nodes
extracted, we could start integrating shaders into it without affecting everything surrounding it.Since
gcore
wasn't going to be no-std anytime soon, we decided on creating thegcore-shaders
crate that operates as the no-std only part ofgcore
and moved over Graphite'scolor
andblending
modules. Most of this was in PR #2925, though it did require multiple fixups from smaller breakages before the shader code was merged. Another difficulty were the types to give node parameter units, such as percentage or angle, since they were f64. In #3095 I created alternative f32 variants and @0HyperCube managed to fixup the UI to support f32 types again. A similar story ensued withTable<Color>
parameters, since it contains aVec
, but since the nodes only ever used the first Color anyway, we decided to switch to a plainColor
in #3096.The
graster-nodes
has a similarly lookinggraster-nodes-shaders
, although it functions quite differently. Due to the node macros only being able to emit tokens where they are in the code, we need to compile the entirety ofgraster-nodes
as a shader and use careful feature gating to exclude std symbols. The node macro has been modified to wrap the CPU and GPU node implementations with a feature gate on the (hardcoded)std
feature, if a shader node declaration is present. The shaders crate has build script to compilegraster-nodes
into a wgsl shader and includes the result as a string. We could put the build script ongraster-nodes
directly, however, building and executing build scripts can't be feature gated. So we would always compile thecargo-gpu
dependency, even when not building with rust-gpu shaders, and build it a second time when compiling the shaders, since that has a different target directory.A new crate with shader node support could easily be added by copying much from
graster-nodes
: It needs a copy of thegraster-nodes-shaders
crate, anstd
feature that enables the dependency on the shader crate, reexport theWGSL_SHADER
from the shader crate inlib.rs
, and feature gate everything std behind thestd
feature. Node functions can be feature gated by adding a standard#[cfg(...)]
to gate the entire function, by adding either acfg()
or ashader_node(None)
to the node macro keep the function but gate everything the node macro emits. At the moment, you'd also have to copy themod fullscreen_vertex
, this should likely be moved to a macro when a second crate is necessary.Runtime
It was specifically requested to be compatible with WebGL since WebGPU support still isn't widespread enough, so we're limited to fragment shaders since compute shaders are unavailable in WebGL. So I've opted for a pretty standard rasterization pipeline, with a vertex shader emitting a fullscreen triangle and the fragment shader calling the node function for each pixel in the output image. A storage buffer (not a uniform buffer since it has extra alignment requirements) is bound to binding 0 to pass a
Uniform
struct containing all uniform params, and the input images start at binding 1 counting upwards (or start at 0 if there are no uniform params). The fragment shader simply calls the node function and returns its Color as output, with uniform params loading their values from the storage buffer and image paramstexelFetch
-ing their colors from their associated image at theglFragCoord
(glsl) /@builtin(position)
(wgsl) integer pixel coordintes.When executed, each GPU operation allocates an output image and draws into it on it's own, there is no "global reasoning" or execution system present. This means multiple sequential GPU operations simply execute one after another, which is quite inefficient on not just the memory consumption but also the memory bandwidth requirements. I proposed building a shader composition system to merge shaders and reduce these bottlenecks, but didn't have enough time to implement it. And the current system is plenty faster than the CPU one anyway.
Related PRs:
BufferStruct
to support bool and enums #3109Summary
Overall, the project was a success in that the infrastructure to support shader nodes has been setup. Unfortunately, the graphite integration took quite a bit longer than initially expected, which resulted only in a handfull of shader nodes being available as of right now. But with the supporting infrastructure present and most blockers resolved, porting further nodes to shaders should be very straight forward.
All code changes summarized (between 2025-04-15 and 2025-09-01):
Future directions
This is a List of things that could be implemented in the future to further advance the system:
WgpuExecutor
input on GPU nodes and connect it automaticallyold notes, please ignore
old
Constraints
Deliverables
nightly-2025-05-09
(~1.88) and Rust 2024 edition. Rust-GPU/rust-gpu#249rustc_backend_spirv
dylibs Rust-GPU/cargo-gpu#69Beta Was this translation helpful? Give feedback.
All reactions