Skip to content

Conversation

@solidpixel
Copy link
Contributor

@solidpixel solidpixel commented Aug 13, 2025

Prior to this PR, layers would intercept all Vulkan functions, routing through a dummy pass-through implementation in a layer if it did not provide a more specialized implementation to actually do something. This obviously has a runtime cost in terms of additional dispatch indirection.

With this PR the layer now only intercepts functions that either have a required implementation in the layer framework itself, or that have a <user_tag> specialization provided by the specific layer being built. This dispatch optimization is optional and, although on by default, can still be disabled because it is useful to intercept everything for API tracing and support investigations.

This PR also makes existing config options for tracing and logging CMake options that can be set dynamically at configure time.

Fixes #14.

@solidpixel
Copy link
Contributor Author

Still needs testing, and would like some performance numbers from a Mali timeline capture to see how much difference this really makes.

@solidpixel
Copy link
Contributor Author

solidpixel commented Aug 18, 2025

This isn't working reliably - currently we're testing function pointer equality, but there is no guarantee that func<default_tag> and func<user_tag> end up with the same instantiation in the binary even if they are functionally identical. They might, especially if using LTO, but they are technically different functions.

@solidpixel solidpixel marked this pull request as draft August 18, 2025 13:14
@solidpixel
Copy link
Contributor Author

solidpixel commented Aug 18, 2025

A standalone PoC for how to do this without altering the layer-provided code at all. We can use has_demo<user_tag>() to test if a layer implementation of "demo" exists and store a bool rather than a function pointer in the dispatch table.

#include <cstdio>
#include <type_traits>

using FunctionPtr = void (*)();
struct default_tag {};
struct user_tag {};

// Force delete the match-all implementation
template<class T>
void demo() = delete;

// Default implementation
template<> void demo<default_tag>()
{
    printf("default_tag\n");
}

#if 0
// User implementation
template<> void demo<user_tag>()
{
    printf("user_tag\n");
}
#endif

// Helper to detect if specialization <T> exists
template<class T>
concept has_demo = requires { demo<T>(); };

// Function to return the suitable function pointer
consteval FunctionPtr getPointer(void)
{
    // Wrap in a template to allow constexpr to not complain about the if body
    // if demo<T> doesn't exist.
    return [] <typename T> {
        if constexpr(has_demo<T>)
        {
            return demo<T>;
        }

        return demo<default_tag>;
    }.operator()<user_tag>();
}

int main(void)
{
    auto* function = getPointer();
    function();
    return 0;
}

@solidpixel
Copy link
Contributor Author

solidpixel commented Aug 18, 2025

This approach is functional based on testing it but seems to need a very new compiler.

No it doesn't ...

Clang++ 18 allows concept requires on a function name when declared with C-style calling convention (concept has_demo = requires { demo<T>; };) without parameters, whereas GCC does not.

Both allow resolution using a callable rather than a raw name resolution (concept has_demo = requires(int a) { demo<T>(a); };), but this means that we need to update the script to emit the correct Vulkan API call parameters into the concept.

@solidpixel
Copy link
Contributor Author

Android test seems to show that this reduces the added CPU overhead of the timeline layer by ~25%.

@solidpixel solidpixel marked this pull request as ready for review August 18, 2025 20:27
@solidpixel solidpixel merged commit 99bfb64 into main Aug 18, 2025
6 checks passed
@solidpixel solidpixel deleted the optdisp branch August 18, 2025 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Framework: Default intercepts should directly invoke the underlying driver function

2 participants