Skip to content

Conversation

justinchuby
Copy link
Collaborator

Output present key value from the Attention op because past key value is provided.

image

Replaces #2632

Signed-off-by: Justin Chu <[email protected]>
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves GQA (Grouped Query Attention) fusion by modifying the attention operation to output present key-value pairs in addition to the attention result. The change ensures that both past and present key-value states are properly handled in the fused operation.

  • Modified the pattern function to return present key and value tensors alongside the attention output
  • Updated the rewrite function to specify 3 outputs for the attention operation

@justinchuby justinchuby added this to the 0.5.4 milestone Oct 15, 2025
Copy link

codecov bot commented Oct 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.38%. Comparing base (811937c) to head (3fb7223).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2634   +/-   ##
=======================================
  Coverage   70.38%   70.38%           
=======================================
  Files         222      222           
  Lines       26288    26288           
  Branches     2629     2629           
=======================================
  Hits        18503    18503           
  Misses       6865     6865           
  Partials      920      920           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant