-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[Renderer] Introduce Renderer for processing chat messages (using RendererConfig)
#30198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a Renderer abstraction to encapsulate chat template processing and tokenization logic. This is a significant and positive architectural refactoring, moving model-specific rendering logic out of the core engine and into dedicated renderer classes. The changes are extensive, touching many parts of the codebase to replace direct tokenizer usage with the new renderer interface. The implementation appears mostly correct and consistent. However, I found a bug in the MistralToolParser where an instance variable is not initialized, which will lead to an AttributeError.
| "the tokenizer!" | ||
| ) | ||
|
|
||
| self.prev_tool_call_arr: list[dict[str, Any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return None | ||
|
|
||
| renderer = self._get_renderer(ctx.tokenizer) | ||
| renderer = self._get_renderer(self.renderer.tokenizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that vllm.renderers.Renderer (self.renderer) is currently for chat only, and is not to be confused with the Renderer inside vllm.entrypoints.renderer.CompletionRenderer (the result of self._get_renderer). The two implementations will be merged in a later PR.
|
|
||
| self.request_logger = request_logger | ||
| self.return_tokens_as_token_ids = return_tokens_as_token_ids | ||
| self._tokenizer_executor = ThreadPoolExecutor(max_workers=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been moved to the Mistral renderer
| created_time: int = field(default_factory=lambda: int(time.time())) | ||
| lora_request: LoRARequest | None = None | ||
|
|
||
| # Shared across most requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer using self.renderer to simplify the code.
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: DarkLight1337 <[email protected]>
0db05b1 to
47c1f05
Compare
RendererConfig)
RendererConfig)ModelConfig)
ModelConfig)RendererConfig)
|
Superseded by #30200 |
Purpose
vllm.renderers.RendererLike, to process chat messages into engine inputs.RendererRegistrywhich lazily registers renderers to avoid circular import problem..vllm.renderers.tokenizer_modetorenderer_mode, and use a specific tokenizer implementation for each renderer, deprecatingTokenizerRegistryin the process.Towards #22880 and #23873
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.