Skip to content

[FEATURE]: Layerwise KV Cache Transfer for Disaggregated TensorRT-LLM Serving #4387

@shpgy-shpgy

Description

@shpgy-shpgy

Feature request

Enabling layerwise KV cache transfer in TensorRT-LLM for disaggregated serving.

Describe the problem you're encountering

The TTFT of TensorRT-LLM for long contexts is markedly slower than that of vLLM with layer-wise KV transfer.

Describe alternatives you've tried

Have tried dynamo’s vLLM backend with LMCache support, but the overall performance of vllm is significantly inferior to tensorrt-llm, offering no practical benefit.

We've come across a similar issue from the past, and it looks like there's already a viable solution:#2436

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions