-
-
Couldn't load subscription status.
- Fork 10.8k
feat: [DRAFT Ignore for now] Add Omnivinci model + subfolder HF config/tokenizer support #27520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: 0xrushi <[email protected]>
2035d37 to
efedbfc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the nvidia/omnivinci model and adds functionality to load Hugging Face configurations, tokenizers, and model weights from a subfolder, which is a valuable enhancement. The changes are well-structured and include corresponding tests. However, I've identified two critical issues. First, the logic for loading subfolder configurations does not correctly handle offline mode, which will cause failures for users in environments without internet access. Second, the subfolder weight loading mechanism appears to be implemented only for the BitsAndBytesModelLoader, meaning the new omnivinci model will fail to load in the default non-quantized use case. Addressing these issues is crucial for the feature to be robust and work as expected for all users.
| sub_cfg = AutoConfig.from_pretrained( | ||
| model, | ||
| trust_remote_code=trust_remote_code, | ||
| revision=revision, | ||
| code_revision=code_revision, | ||
| token=_get_hf_token(), | ||
| subfolder="llm", | ||
| **{k: v for k, v in kwargs.items() if k != "local_files_only"}, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local_files_only keyword argument is explicitly removed when calling AutoConfig.from_pretrained for the subfolder configuration. This will cause issues in offline mode, as from_pretrained will attempt to connect to the Hugging Face Hub even if HF_HUB_OFFLINE is set. This could lead to failures for users operating in environments without internet access. The kwargs should be passed directly to respect the offline setting.
| sub_cfg = AutoConfig.from_pretrained( | |
| model, | |
| trust_remote_code=trust_remote_code, | |
| revision=revision, | |
| code_revision=code_revision, | |
| token=_get_hf_token(), | |
| subfolder="llm", | |
| **{k: v for k, v in kwargs.items() if k != "local_files_only"}, | |
| ) | |
| sub_cfg = AutoConfig.from_pretrained( | |
| model, | |
| trust_remote_code=trust_remote_code, | |
| revision=revision, | |
| code_revision=code_revision, | |
| token=_get_hf_token(), | |
| subfolder="llm", | |
| **kwargs, | |
| ) |
| # direct the default loader to read weights from the llm/ subfolder | ||
| self.allow_patterns_overrides = [ | ||
| "llm/*.safetensors", | ||
| "llm/consolidated*.safetensors", | ||
| "llm/*.pt", | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mechanism to load weights from a subfolder using allow_patterns_overrides seems to be implemented only in BitsAndBytesModelLoader. This means that the OmniVinciForCausalLM model will likely fail to load its weights if it's not used with bitsandbytes quantization, as the default model loader will not be aware of this attribute or the logic to search in the llm/ subfolder. This makes the model support partial and can fail in the common non-quantized use case. To fix this, the subfolder weight loading logic should be added to the default model loader as well. The comment "direct the default loader" is also currently misleading.
Signed-off-by: 0xrushi <[email protected]>
Signed-off-by: 0xrushi <[email protected]>
Signed-off-by: 0xrushi <[email protected]>
Purpose
Resolves #27509
Test Plan
or
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.