Skip to content

Conversation

@DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Dec 7, 2025

Purpose

  • Prototype an interface, vllm.renderers.RendererLike, to process chat messages into engine inputs.
  • Introduce RendererRegistry which lazily registers renderers to avoid circular import problem..
  • Move implementation-specific chat utils to the corresponding renderer in vllm.renderers.
  • TODO: Migrate tokenizer_mode to renderer_mode, and use a specific tokenizer implementation for each renderer, deprecating TokenizerRegistry in the process.

Towards #22880 and #23873

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@DarkLight1337 DarkLight1337 changed the title [Renderer] Introduce Renderer [Renderer] Introduce Renderer for processing chat messages Dec 7, 2025
@mergify mergify bot added frontend multi-modality Related to multi-modality (#4194) performance Performance-related issues gpt-oss Related to GPT-OSS models structured-output labels Dec 7, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Renderer abstraction to encapsulate chat template processing and tokenization logic. This is a significant and positive architectural refactoring, moving model-specific rendering logic out of the core engine and into dedicated renderer classes. The changes are extensive, touching many parts of the codebase to replace direct tokenizer usage with the new renderer interface. The implementation appears mostly correct and consistent. However, I found a bug in the MistralToolParser where an instance variable is not initialized, which will lead to an AttributeError.

"the tokenizer!"
)

self.prev_tool_call_arr: list[dict[str, Any]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The instance variable prev_tool_call_arr is declared as a type hint but not initialized in the __init__ method. This will cause an AttributeError when it's accessed before being set. Please initialize it in __init__, for example: self.prev_tool_call_arr = [].

return None

renderer = self._get_renderer(ctx.tokenizer)
renderer = self._get_renderer(self.renderer.tokenizer)
Copy link
Member Author

@DarkLight1337 DarkLight1337 Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that vllm.renderers.Renderer (self.renderer) is currently for chat only, and is not to be confused with the Renderer inside vllm.entrypoints.renderer.CompletionRenderer (the result of self._get_renderer). The two implementations will be merged in a later PR.


self.request_logger = request_logger
self.return_tokens_as_token_ids = return_tokens_as_token_ids
self._tokenizer_executor = ThreadPoolExecutor(max_workers=1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been moved to the Mistral renderer

created_time: int = field(default_factory=lambda: int(time.time()))
lora_request: LoRARequest | None = None

# Shared across most requests
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer using self.renderer to simplify the code.

@mergify
Copy link

mergify bot commented Dec 7, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 7, 2025
Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337 DarkLight1337 changed the title [Renderer] Introduce Renderer for processing chat messages [Renderer] Introduce Renderer for processing chat messages (using RendererConfig) Dec 7, 2025
@DarkLight1337 DarkLight1337 changed the title [Renderer] Introduce Renderer for processing chat messages (using RendererConfig) [Renderer] Introduce Renderer for processing chat messages (using ModelConfig) Dec 7, 2025
@DarkLight1337 DarkLight1337 changed the title [Renderer] Introduce Renderer for processing chat messages (using ModelConfig) [Renderer] Introduce Renderer for processing chat messages (using RendererConfig) Dec 7, 2025
@github-project-automation github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Dec 7, 2025
@DarkLight1337
Copy link
Member Author

Superseded by #30200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models multi-modality Related to multi-modality (#4194) needs-rebase performance Performance-related issues structured-output tool-calling v1

Projects

Status: Done
Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant