[FEATURE]: Layerwise KV Cache Transfer for Disaggregated TensorRT-LLM Serving

### Feature request

Enabling layerwise KV cache transfer in TensorRT-LLM for disaggregated serving.

### Describe the problem you're encountering

The TTFT of TensorRT-LLM for long contexts is markedly slower than that of vLLM with layer-wise KV transfer.

### Describe alternatives you've tried

Have tried dynamo’s vLLM backend with LMCache support, but the overall performance of vllm is significantly inferior to tensorrt-llm, offering no practical benefit.

We've come across a similar issue from the past, and it looks like there's already a viable solution：https://github.com/ai-dynamo/dynamo/issues/2436

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE]: Layerwise KV Cache Transfer for Disaggregated TensorRT-LLM Serving #4387

Feature request

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE]: Layerwise KV Cache Transfer for Disaggregated TensorRT-LLM Serving #4387

Description

Feature request

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions