-
Notifications
You must be signed in to change notification settings - Fork 646
[Feature] model_runner refactor #4764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: zhenwenqi2024 <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant refactoring to align the vllm-ascend codebase with the upstream vLLM v1 API, particularly in the worker components. Key changes include refactoring InputBatch and BlockTable to inherit from their upstream counterparts, which reduces code duplication and improves maintainability. Additionally, it centralizes the management of Ascend-specific resources like mc2_mask, cos, and sin using global variables, moving away from passing them as parameters. The speculative decoding method deepseek_mtp has been generalized to mtp. Overall, these changes improve code structure and alignment with the main vLLM repository. My review has identified one high-severity issue related to an incorrect type hint that should be addressed.
|
|
||
|
|
||
| _mc2_tokens_capacity: Optional[int] = None | ||
| _reserved_mc2_mask: Optional[int] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hint for _reserved_mc2_mask is Optional[int], but it is assigned a torch.Tensor of dtype=torch.bool in the set_mc2_mask function. This should be corrected to Optional[torch.Tensor] to accurately reflect the variable's type and prevent potential bugs.
| _reserved_mc2_mask: Optional[int] = None | |
| _reserved_mc2_mask: Optional[torch.Tensor] = None |
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
cleancode Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
|
typo in title and content : reactor/reafactor -> refactor |
Signed-off-by: zhenwenqi2024 <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: zhenwenqi2024 <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: zhenwenqi2024 <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: zhenwenqi2024 <[email protected]>
What this PR does / why we need it?
refactor npu_modelrunner, we should be close to gpu_modelrunner
Does this PR introduce any user-facing change?
NO
How was this patch tested?