Skip to content

Conversation

lisaliu1
Copy link
Contributor

@lisaliu1 lisaliu1 commented Oct 1, 2025

This PR completes the following work.

  • clean up BaseAttentionCache APIs
  • remove cache related functions and attributes from LlmInferenceExecRequest
  • add APIs in decoder to organize the cache across different requests and beams.

Copy link
Contributor

github-actions bot commented Oct 1, 2025

Coverage report

This PR does not seem to contain any modification to coverable code.

@lisaliu1 lisaliu1 marked this pull request as ready for review October 1, 2025 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant