[NIXL] Support P tensor-parallel-size > D tensor-parallel-size #27274

NickLucche · 2025-10-21T17:04:09Z

Overview

This PR addresses the following case, P tensor-parallel-size > D tensor-parallel-size.

I think it helps to differentiate two main cases

MLA

For MLA model, the workflow is easier: each D worker reads from some other single P worker (fan-out reads to avoid all reading from same remote), as MLA cache is duplicated. Some P workers will not be read from at all.
Mind that this also holds for the DP/EP deployment, where TP size on D will often be 1!

From PR #23917, which also serves as good use-case. Btw as explained in that PR, the number of requests to "expect" is indeed the number of remote instances reading from P.

The main issue to implement that in Nixl is that each P worker will track requests as they come in (_reqs_to_send, _reqs_to_process) and those structures are only cleared properly when a read is detected (o/w timeouts would be raised on P).
To address that, I am allowing MLA D ranks to only execute one transfer, but notifying all affected remote that the read is completed (sending multiple nixl notifs).

cc @njhill @markmc

Dense

For dense models, every D worker will read from n P workers to re-compose its own KV cache, where n is referred to as tp_ratio in code.

This is possible because number of heads on P is H/n that of D's, so you can efficiently read into D's cache using HND layout. That is, in memory, you're just laying out flat ND tensors H/n , n times

Side note: current design is flexible and allows for dynamic discovery of remotes with different tp_sizes. However this is not a feature that is currently supported, but it helps to take into account when considering impl choices. It's more of an optional route I'd like to keep open.

Changes

The main change this PR needs to allow is for a D worker to read from multiple P's.
Practical edits this PR introduces to do so:

src_xfer_side_chunked_handles: local regions need to be split differently based on how many remotes we want to read from. This is prepared during handshake, once .
a few structures go from single remote to [engine_id][rank_no] to accomodate the above
get_target_remote->get_target_remotes for the same reason, + a bunch of for loops over its result
P has to wait for at most a single read notification (communicated from D)
tp_ratio extension to indicate remote P size greater than D
multiple xfers/handles per request: this was partly already supported, I just fixed a bug in _pop_done_transfers
multiple notifs-single read to optimize for MLA models

How to test

pytest -v -s -x tests/v1/kv_connector/unit/test_nixl_connector.py::TestNixlHandshake::test_prefill_tp_size_greater_than_decode_tp_size/test_prefill_tp_size_greater_than_decode_tp_size_mla

And check out tp_config_sweep_accuracy with config:

PREFILLER_TP_SIZE=2 DECODER_TP_SIZE=1 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh

TODO

Coming soon to this PR:

Avoid stranding requests on P for MLA
~~[ ] On MLA with DP/EP, avoid having all workers read from same remote~~ deferring
DP_EP tests
It does NOT support replicated KV heads scenario, tp_size>num_heads. This is definitely doable, just I believe on weak demand atm so we can postpone it.

NickLucche · 2025-10-21T17:06:59Z

cc @GuanLuo let me know if this PR meets the expected set of features you aimed to get with your work. Thank you!

tests/v1/kv_connector/unit/test_nixl_connector.py

NickLucche · 2025-10-21T17:08:58Z

tests/v1/kv_connector/unit/test_output_aggregator.py

 class DummyModelRunnerOutput(ModelRunnerOutput):
    def __init__(
        self,
        finished_sending: set[str] | None = None,


ignore file, to be rebased once #26734 lands

NickLucche · 2025-10-21T17:09:05Z

vllm/distributed/kv_transfer/kv_connector/v1/base.py

        """
        Get the count of requests expected to complete send/receive operations
-        via this connector.
+        via this connector. This method is used to initialize the


ignore, to be rebased once #26734 lands

NickLucche · 2025-10-21T17:09:54Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

+            tp_ratio,
+        )
+
+        ### (Optional) Register local agent memory regions. MLA is not split.


gist of the PR

NickLucche · 2025-10-21T17:14:06Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

        # on notification so that dst worker can wait before freeing blocks.
-        tp_ratio = self.kv_topo.tp_ratio_from_engine_id(dst_engine_id)
+        # Cap to 1 when P TP > D TP: only a single rank will read from remote.
+        tp_ratio = max(1, self.kv_topo.tp_ratio_from_engine_id(dst_engine_id))


this is to have P only wait for 1 request instead of -tp_ratio

NickLucche · 2025-10-21T17:14:27Z

vllm/distributed/kv_transfer/kv_connector/utils.py

 # SPDX-License-Identifier: Apache-2.0
 # SPDX-FileCopyrightText: Copyright contributors to the vLLM project
 """
 KV cache helper for store.


ignore file, to be rebased once #26734 lands

NickLucche · 2025-10-21T17:14:51Z

vllm/executor/executor_base.py


 import asyncio
 import time
 from abc import ABC, abstractmethod


ignore file, to be rebased once #26734 lands

NickLucche · 2025-10-21T17:14:56Z

vllm/v1/engine/core.py

            include_finished_set=vllm_config.parallel_config.data_parallel_size > 1,
            log_stats=self.log_stats,
            block_size=scheduler_block_size,
        )


ignore, to be rebased once #26734 lands

NickLucche · 2025-10-21T17:15:00Z

vllm/v1/outputs.py

 class KVConnectorOutput:
    # [req_ids]
    finished_sending: set[str] | None = None
    finished_recving: set[str] | None = None


ignore, to be rebased once #26734 lands

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

mergify · 2025-10-22T17:19:43Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-10-23T16:01:41Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

NickLucche · 2025-10-23T16:36:21Z

PR's now ready for review!

NickLucche · 2025-10-25T13:42:53Z

cc @xuechendi for xpu

mergify · 2025-10-27T14:39:04Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

xuechendi · 2025-10-27T14:52:53Z

@zhenwei-intel , please help to review, thx

mergify · 2025-11-14T19:12:21Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: NickLucche <[email protected]>

NickLucche · 2025-11-21T14:08:17Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

        self.num_layers = 0

        # nixl_prepped_dlist_handle.
-        self.src_xfer_side_handle: int = 0


dropped default self.src_xfer_side_handle in favor of
self.src_xfer_handles_by_block_size[self.block_size]

NickLucche · 2025-11-21T14:09:25Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

+            if self.use_mla and tp_ratio < 0:
+                # ..but we still need to notify the other remote ranks that we
+                # have the blocks we need so they can update the request state.


important mla logic

xuechendi · 2025-11-21T16:47:39Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

-        self.kv_caches_base_addr: dict[EngineId, list[int]] = {}
        self.device_id: int = 0
+        # Current rank may pull from multiple remote TP workers.
+        self.kv_caches_base_addr: defaultdict[EngineId, dict[int, list[int]]] = (


It will be helpful with a comment to explain the leveled-dict, ex：

# EngineId, dict[int, list[int]] -> engine_id, tp_rank, base_addr_for_layer

xuechendi · 2025-11-21T18:02:04Z

PR is verified with heter_block_size test, and it looks good.
Thanks

mergify bot added v1 kv-connector labels Oct 21, 2025

NickLucche commented Oct 21, 2025

View reviewed changes

andylolu2 reviewed Oct 22, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Oct 22, 2025

NickLucche force-pushed the nixl-ptp-gt-dtp branch from 054e7ff to 77577b0 Compare October 23, 2025 08:23

mergify bot removed the needs-rebase label Oct 23, 2025

mergify bot added the needs-rebase label Oct 23, 2025

NickLucche force-pushed the nixl-ptp-gt-dtp branch from 54cf766 to 78ef532 Compare October 23, 2025 16:35

NickLucche marked this pull request as ready for review October 23, 2025 16:35

NickLucche requested a review from ApostaC as a code owner October 23, 2025 16:35

mergify bot removed the needs-rebase label Oct 23, 2025

NickLucche requested a review from andylolu2 October 24, 2025 08:51

mergify bot added the needs-rebase label Oct 27, 2025

xuechendi reviewed Oct 27, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated Show resolved Hide resolved

tlrmchlsmth mentioned this pull request Oct 27, 2025

[Bug]: PD example TP "imballance" crash llm-d/llm-d#394

Open

NickLucche force-pushed the nixl-ptp-gt-dtp branch from 78ef532 to 2d08fa7 Compare November 7, 2025 12:55

mergify bot removed the needs-rebase label Nov 7, 2025

NickLucche force-pushed the nixl-ptp-gt-dtp branch from 2d08fa7 to 4feab2c Compare November 7, 2025 15:09

NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 9, 2025

mergify bot added the needs-rebase label Nov 14, 2025

init

423d4ab

Signed-off-by: NickLucche <[email protected]>

NickLucche added 9 commits November 19, 2025 16:53

precommit

4d59773

Signed-off-by: NickLucche <[email protected]>

tests

3db5518

Signed-off-by: NickLucche <[email protected]>

cruft

f3dc6df

Signed-off-by: NickLucche <[email protected]>

dp-ep tests

5a84e2c

Signed-off-by: NickLucche <[email protected]>

address block freeing for MLA

07452f1

Signed-off-by: NickLucche <[email protected]>

more MLA tests

a187ed8

Signed-off-by: NickLucche <[email protected]>

intel review

5e8cab7

Signed-off-by: NickLucche <[email protected]>

rebase fix

3e39b57

Signed-off-by: NickLucche <[email protected]>

fix tests

3899d23

Signed-off-by: NickLucche <[email protected]>

NickLucche force-pushed the nixl-ptp-gt-dtp branch from 4feab2c to 3899d23 Compare November 21, 2025 13:51

mergify bot removed the needs-rebase label Nov 21, 2025

NickLucche commented Nov 21, 2025

View reviewed changes

NickLucche requested a review from xuechendi November 21, 2025 14:22

xuechendi reviewed Nov 21, 2025

View reviewed changes

xuechendi approved these changes Nov 21, 2025

View reviewed changes

Uh oh!

[NIXL] Support P tensor-parallel-size > D tensor-parallel-size #27274

Are you sure you want to change the base?

[NIXL] Support P tensor-parallel-size > D tensor-parallel-size #27274

Uh oh!

Conversation

NickLucche commented Oct 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

MLA

Dense

Changes

How to test

TODO

Uh oh!

NickLucche commented Oct 21, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Oct 22, 2025

Uh oh!

mergify bot commented Oct 23, 2025

Uh oh!

NickLucche commented Oct 23, 2025

Uh oh!

NickLucche commented Oct 25, 2025

Uh oh!

mergify bot commented Oct 27, 2025

Uh oh!

Uh oh!

xuechendi commented Oct 27, 2025

Uh oh!

mergify bot commented Nov 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuechendi commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NickLucche commented Oct 21, 2025 •

edited by github-actions bot

Loading