feat: [DRAFT Ignore for now] Add Omnivinci model + subfolder HF config/tokenizer support #27520

0xrushi · 2025-10-26T03:03:40Z

Purpose

Resolves #27509

Introduces new model nvidia/omnivinci
Adds support for Hugging Face configs and tokenizers located in subfolders.

Test Plan

python -m vllm.entrypoints.openai.api_server --model nvidia/omnivinci --quantization bitsandbytes --dtype float16 --gpu-memory-utilization 0.8 --max-model-len 8192

or

from vllm import LLM, SamplingParams

def main() -> None:
    sampling_params = SamplingParams(
        temperature=0.7,
        top_p=0.9,
        max_tokens=256,
    )

    prompts = [
        "Explain the difference between BFS and DFS in one paragraph.",
        "Write a short poem about autumn in two lines.",
    ]

    llm = LLM(
        model="nvidia/omnivinci",
        quantization="bitsandbytes",
        dtype="float16",
        max_model_len=8192,
        gpu_memory_utilization=0.8,
        trust_remote_code=False,
    )

    outputs = llm.generate(prompts, sampling_params)

    for i, out in enumerate(outputs):
        print(f"\n=== Prompt {i} ===")
        print(prompts[i])
        print("\n--- Output ---")
        print(out.outputs[0].text.strip())


if __name__ == "__main__":
    main()

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: 0xrushi <[email protected]>

gemini-code-assist

Code Review

This pull request introduces support for the nvidia/omnivinci model and adds functionality to load Hugging Face configurations, tokenizers, and model weights from a subfolder, which is a valuable enhancement. The changes are well-structured and include corresponding tests. However, I've identified two critical issues. First, the logic for loading subfolder configurations does not correctly handle offline mode, which will cause failures for users in environments without internet access. Second, the subfolder weight loading mechanism appears to be implemented only for the BitsAndBytesModelLoader, meaning the new omnivinci model will fail to load in the default non-quantized use case. Addressing these issues is crucial for the feature to be robust and work as expected for all users.

gemini-code-assist · 2025-10-26T03:05:29Z

vllm/transformers_utils/config.py

+                sub_cfg = AutoConfig.from_pretrained(
+                    model,
+                    trust_remote_code=trust_remote_code,
+                    revision=revision,
+                    code_revision=code_revision,
+                    token=_get_hf_token(),
+                    subfolder="llm",
+                    **{k: v for k, v in kwargs.items() if k != "local_files_only"},
+                )


The local_files_only keyword argument is explicitly removed when calling AutoConfig.from_pretrained for the subfolder configuration. This will cause issues in offline mode, as from_pretrained will attempt to connect to the Hugging Face Hub even if HF_HUB_OFFLINE is set. This could lead to failures for users operating in environments without internet access. The kwargs should be passed directly to respect the offline setting.

Suggested change

sub_cfg = AutoConfig.from_pretrained(

model,

trust_remote_code=trust_remote_code,

revision=revision,

code_revision=code_revision,

token=_get_hf_token(),

subfolder="llm",

**{k: v for k, v in kwargs.items() if k != "local_files_only"},

)

sub_cfg = AutoConfig.from_pretrained(

model,

trust_remote_code=trust_remote_code,

revision=revision,

code_revision=code_revision,

token=_get_hf_token(),

subfolder="llm",

**kwargs,

)

gemini-code-assist · 2025-10-26T03:05:30Z

vllm/model_executor/models/omnivinci.py

+        # direct the default loader to read weights from the llm/ subfolder
+        self.allow_patterns_overrides = [
+            "llm/*.safetensors",
+            "llm/consolidated*.safetensors",
+            "llm/*.pt",
+        ]


The mechanism to load weights from a subfolder using allow_patterns_overrides seems to be implemented only in BitsAndBytesModelLoader. This means that the OmniVinciForCausalLM model will likely fail to load its weights if it's not used with bitsandbytes quantization, as the default model loader will not be aware of this attribute or the logic to search in the llm/ subfolder. This makes the model support partial and can fail in the common non-quantized use case. To fix this, the subfolder weight loading logic should be added to the default model loader as well. The comment "direct the default loader" is also currently misleading.

Signed-off-by: 0xrushi <[email protected]>

0xrushi requested a review from 22quinn as a code owner October 26, 2025 03:03

add omnivici

efedbfc

Signed-off-by: 0xrushi <[email protected]>

mergify bot added the new-model Requests to new models label Oct 26, 2025

0xrushi force-pushed the feat/omnivici branch from 2035d37 to efedbfc Compare October 26, 2025 03:04

gemini-code-assist bot reviewed Oct 26, 2025

View reviewed changes

0xrushi added 5 commits October 25, 2025 23:26

default loader fix

a309aa9

Signed-off-by: 0xrushi <[email protected]>

default loader fix

8ac923d

Signed-off-by: 0xrushi <[email protected]>

Merge branch 'main' into feat/omnivici

5cc09b1

precommit

88a39f6

Signed-off-by: 0xrushi <[email protected]>

Merge branch 'main' into feat/omnivici

da55939

0xrushi changed the title ~~feat: Add Omnivinci model + subfolder HF config/tokenizer support~~ feat: [DRAFT] Add Omnivinci model + subfolder HF config/tokenizer support Oct 27, 2025

0xrushi changed the title ~~feat: [DRAFT] Add Omnivinci model + subfolder HF config/tokenizer support~~ feat: [DRAFT Ignore for now] Add Omnivinci model + subfolder HF config/tokenizer support Oct 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

feat: [DRAFT Ignore for now] Add Omnivinci model + subfolder HF config/tokenizer support #27520

feat: [DRAFT Ignore for now] Add Omnivinci model + subfolder HF config/tokenizer support #27520

0xrushi commented Oct 26, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 26, 2025

Uh oh!

gemini-code-assist bot Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

feat: [DRAFT Ignore for now] Add Omnivinci model + subfolder HF config/tokenizer support #27520

Are you sure you want to change the base?

feat: [DRAFT Ignore for now] Add Omnivinci model + subfolder HF config/tokenizer support #27520

Conversation

0xrushi commented Oct 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

0xrushi commented Oct 26, 2025 •

edited by github-actions bot

Loading