[Feature]: PD separation supports prefix caching #12257 #1

skyCreateXian · 2025-01-21T11:09:12Z

Increase support for prefix caching in PD

Sending and receiving without transmitting the full amount of KV and hidden signals
Resolve the issue of opening prefix caching request errors

skyCreateXian

After the prefix hits, mark some of the previous text as context, so only the part that needs to be calculated is transmitted

ShangmingCai · 2025-02-06T08:44:45Z

After the prefix hits, mark some of the previous text as context, so only the part that needs to be calculated is transmitted

The main branch of this repo will synchronize all updates of the upstream vllm repo, so we recommend that you submit and contribute this PR to the vllm community.

[Feature]: PD separation supports prefix caching vllm-project#12257

96b6993

skyCreateXian commented Jan 21, 2025

View reviewed changes

ShangmingCai closed this Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: PD separation supports prefix caching #12257 #1

[Feature]: PD separation supports prefix caching #12257 #1

Uh oh!

skyCreateXian commented Jan 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

skyCreateXian left a comment

Uh oh!

ShangmingCai commented Feb 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature]: PD separation supports prefix caching #12257 #1

[Feature]: PD separation supports prefix caching #12257 #1

Uh oh!

Conversation

skyCreateXian commented Jan 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skyCreateXian left a comment

Choose a reason for hiding this comment

Uh oh!

ShangmingCai commented Feb 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

skyCreateXian commented Jan 21, 2025 •

edited by github-actions bot

Loading