Feature request
Enabling layerwise KV cache transfer in TensorRT-LLM for disaggregated serving.
Describe the problem you're encountering
The TTFT of TensorRT-LLM for long contexts is markedly slower than that of vLLM with layer-wise KV transfer.
Describe alternatives you've tried
Have tried dynamo’s vLLM backend with LMCache support, but the overall performance of vllm is significantly inferior to tensorrt-llm, offering no practical benefit.
We've come across a similar issue from the past, and it looks like there's already a viable solution:#2436