Skip to content

Conversation

@zhenwenqi2024
Copy link

@zhenwenqi2024 zhenwenqi2024 commented Dec 7, 2025

What this PR does / why we need it?

refactor npu_modelrunner, we should be close to gpu_modelrunner

Does this PR introduce any user-facing change?

NO

How was this patch tested?

Signed-off-by: zhenwenqi2024 <[email protected]>
@github-actions
Copy link

github-actions bot commented Dec 7, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring to align the vllm-ascend codebase with the upstream vLLM v1 API, particularly in the worker components. Key changes include refactoring InputBatch and BlockTable to inherit from their upstream counterparts, which reduces code duplication and improves maintainability. Additionally, it centralizes the management of Ascend-specific resources like mc2_mask, cos, and sin using global variables, moving away from passing them as parameters. The speculative decoding method deepseek_mtp has been generalized to mtp. Overall, these changes improve code structure and alignment with the main vLLM repository. My review has identified one high-severity issue related to an incorrect type hint that should be addressed.



_mc2_tokens_capacity: Optional[int] = None
_reserved_mc2_mask: Optional[int] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The type hint for _reserved_mc2_mask is Optional[int], but it is assigned a torch.Tensor of dtype=torch.bool in the set_mc2_mask function. This should be corrected to Optional[torch.Tensor] to accurately reflect the variable's type and prevent potential bugs.

Suggested change
_reserved_mc2_mask: Optional[int] = None
_reserved_mc2_mask: Optional[torch.Tensor] = None

@github-actions
Copy link

github-actions bot commented Dec 7, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

zhenwenqi2024 and others added 4 commits December 7, 2025 19:33
cleancode

Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
@github-actions
Copy link

github-actions bot commented Dec 8, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

zhenwenqi2024 and others added 2 commits December 8, 2025 11:29
@realliujiaxu
Copy link
Contributor

typo in title and content : reactor/reafactor -> refactor

@zhenwenqi2024 zhenwenqi2024 changed the title [Feature] model_runner reactor [Feature] model_runner refactor Dec 8, 2025
Signed-off-by: zhenwenqi2024 <[email protected]>
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: zhenwenqi2024 <[email protected]>
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants