Skip to content

Conversation

ngc92
Copy link
Contributor

@ngc92 ngc92 commented May 1, 2025

adds a cudnn implementation that allows for GQA
for better compatibility, that means that our rope kernel needs to be GQA aware, too, then we can just exchange the order of rope and repkv and reuse most of the code between the cudnn/non-cudnn code-paths.

@ngc92 ngc92 changed the base branch from master to llama3 May 1, 2025 12:40
@ngc92 ngc92 force-pushed the ngc92/llama3-cudnn branch 10 times, most recently from e616e00 to c157bc7 Compare May 2, 2025 21:52
@ngc92 ngc92 force-pushed the ngc92/llama3-cudnn branch from c157bc7 to cbe9007 Compare June 26, 2025 16:32
@ngc92 ngc92 force-pushed the ngc92/llama3-cudnn branch 4 times, most recently from 7556f7a to eceab69 Compare June 26, 2025 17:42
@ngc92 ngc92 force-pushed the ngc92/llama3-cudnn branch from eceab69 to 90a409c Compare July 22, 2025 17:21
@ngc92 ngc92 force-pushed the ngc92/llama3-cudnn branch from 90a409c to 9f05d7b Compare July 24, 2025 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant