Skip to content

Conversation

@skyCreateXian
Copy link

@skyCreateXian skyCreateXian commented Jan 21, 2025

Increase support for prefix caching in PD

  1. Sending and receiving without transmitting the full amount of KV and hidden signals
  2. Resolve the issue of opening prefix caching request errors

Copy link
Author

@skyCreateXian skyCreateXian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the prefix hits, mark some of the previous text as context, so only the part that needs to be calculated is transmitted

@ShangmingCai
Copy link
Collaborator

After the prefix hits, mark some of the previous text as context, so only the part that needs to be calculated is transmitted

The main branch of this repo will synchronize all updates of the upstream vllm repo, so we recommend that you submit and contribute this PR to the vllm community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants