-
Notifications
You must be signed in to change notification settings - Fork 0
[V1][Spec Decode][Perf] Add fused Triton kernel to reduce overhead in EAGLE spec decoding #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Aaron Pham <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Russell Bryant <[email protected]>
…-project#17826) Signed-off-by: Jerry Zhang <[email protected]>
) Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: mgoin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Nick Hill <[email protected]>
…ct#17945) Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Aaron Pham <[email protected]>
Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]>
…m-project#18154) Signed-off-by: Luka Govedič <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
…llm-project#18013) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Andy Xie <[email protected]>
Signed-off-by: inkcherry <[email protected]>
…llm-project#18178) Signed-off-by: Mengqing Cao <[email protected]>
Signed-off-by: David Xia <[email protected]>
Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: omahs <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: yangxia <[email protected]>
…vllm-project#18161) Signed-off-by: Thomas Parnell <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]>
… in AMD Pipeline (vllm-project#18106) Signed-off-by: Alexei V. Ivanov <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
…8190) Signed-off-by: Sebastian Schönnenbeck <[email protected]>
…Error to ValueError (vllm-project#18181) Signed-off-by: Abatom <[email protected]>
… unquantizedMethod to reenable LLama4 BF16 (vllm-project#18205) Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: Leo Tian <[email protected]>
Signed-off-by: Leo Tian <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Parallel PR to vllm-project#18221