Fix LoRA support for multimodal models (VLMs) by implementing a consistent pattern for skipping vision components #11261

ConnorLi96 · 2025-10-06T05:38:34Z

Motivation

LoRA initialization code expects config.num_hidden_layers and config.hidden_size at the top level of model config
For Gemma3ForConditionalGeneration, these attributes are nested under config.text_config (not at top level)
This caused: AssertionError: LoRA buffer shape torch.Size([4096, 32]) does not match weight shape torch.Size([1728, 32])
Vision/projector modules were being incorrectly included in LoRA initialization

Modifications

gemma3_mm.py - Add LoRA support with config attribute exposure
mllama4.py - Add LoRA support (follows same pattern)
lora_manager.py - Remove generic vision skipping logic

Accuracy Tests

N/A

Benchmarking and Profiling

N/A

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-10-06T05:38:49Z

Summary of Changes

Hello @ConnorLi96, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves issues preventing LoRA (Low-Rank Adaptation) from functioning correctly with multimodal models, specifically Gemma3ForConditionalGeneration. It ensures that LoRA can properly initialize by making necessary configuration attributes accessible at the top level and by explicitly excluding non-language model components from LoRA application, thereby preventing buffer shape mismatches.

Highlights

LoRA Compatibility for Multimodal Models: Exposed num_hidden_layers and hidden_size from text_config to the top-level config in Gemma3ForConditionalGeneration to resolve LoRA initialization failures in multimodal models.
Enhanced Vision Component Skipping for LoRA: Updated should_skip_lora_for_vision_model to identify and skip a wider range of vision and audio components (e.g., vision_model, vision_tower, multi_modal_projector) when applying LoRA, ensuring it's only applied to language model layers.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to fix LoRA support for multimodal models, specifically Gemma3. The changes involve patching the model configuration in gemma3_mm.py to be compatible with the LoRA manager's expectations and generalizing the logic in lora_manager.py to skip vision components.

My review identifies a potential bug in the updated skipping logic in lora_manager.py and suggests a more robust implementation. Additionally, while the config patching in gemma3_mm.py provides a functional fix, I've noted it as a short-term solution and recommend a more scalable, architectural improvement in the LoRAManager for better long-term maintainability.

python/sglang/srt/lora/lora_manager.py

python/sglang/srt/models/gemma3_mm.py

Fridge003

LGTM

lifuhuang · 2025-10-06T08:12:52Z

Thank you @ConnorLi96 for the contribution!
Let's discuss here: https://sgl-fru7574.slack.com/archives/C09JDPAP3FA/p1759738289851509

ConnorLi96 · 2025-10-06T21:36:50Z

python/sglang/srt/models/gemma3_mm.py

+        if not hasattr(config, "num_hidden_layers"):
+            config.num_hidden_layers = config.text_config.num_hidden_layers
+        if not hasattr(config, "hidden_size"):
+            config.hidden_size = config.text_config.hidden_size


Why only Gemma3 needs config attribute exposure:

Gemma3ForConditionalGeneration uses a config structure where language model attributes (num_hidden_layers, hidden_size) are exclusively in config.text_config, with no top-level copies

Other VLMs like Phi4ForConditionalGeneration and Llama4ForConditionalGeneration either:

Already have these attributes at the top level, OR

Have different config inheritance that makes them accessible

LoRA's LoRAMemoryPool.__init__ directly accesses base_hf_config.num_hidden_layers (line 59 in mem_pool.py), which fails for Gemma3's nested-only structure

…stent pattern for skipping vision components (sgl-project#11261)

ConnorLi96 added 2 commits October 5, 2025 22:35

fix lora gemma 3 support in text

9c26d26

delete other types

99f9151

ConnorLi96 requested review from Ying1123, Fridge003 and lifuhuang as code owners October 6, 2025 05:38

gemini-code-assist bot reviewed Oct 6, 2025

View reviewed changes

python/sglang/srt/lora/lora_manager.py Outdated Show resolved Hide resolved

python/sglang/srt/models/gemma3_mm.py Show resolved Hide resolved

zhyncs approved these changes Oct 6, 2025

View reviewed changes

zhyncs added the run-ci label Oct 6, 2025

zhyncs self-assigned this Oct 6, 2025

zhyncs added the high priority label Oct 6, 2025

having better logic to skip

22381b7

Fridge003 approved these changes Oct 6, 2025

View reviewed changes

ConnorLi96 added 2 commits October 6, 2025 12:57

support gemma3 lora by adding should_apply_lora method

6b58295

support llama4 should_apply_lora

3975c01

ConnorLi96 changed the title ~~fix lora gemma 3 support in text~~ Fix LoRA support for multimodal models (VLMs) by implementing a consistent pattern for skipping vision components Oct 6, 2025

ConnorLi96 commented Oct 6, 2025

View reviewed changes

zhyncs merged commit afc35cc into sgl-project:main Oct 7, 2025
57 of 64 checks passed

PrinsYin pushed a commit to PrinsYin/sglang that referenced this pull request Oct 7, 2025

Fix LoRA support for multimodal models (VLMs) by implementing a consi…

8c32e8e

…stent pattern for skipping vision components (sgl-project#11261)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix LoRA support for multimodal models (VLMs) by implementing a consistent pattern for skipping vision components #11261

Fix LoRA support for multimodal models (VLMs) by implementing a consistent pattern for skipping vision components #11261

ConnorLi96 commented Oct 6, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Fridge003 left a comment

Uh oh!

lifuhuang commented Oct 6, 2025

Uh oh!

ConnorLi96 Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

Fix LoRA support for multimodal models (VLMs) by implementing a consistent pattern for skipping vision components #11261

Fix LoRA support for multimodal models (VLMs) by implementing a consistent pattern for skipping vision components #11261

Conversation

ConnorLi96 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Oct 6, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Fridge003 left a comment

Choose a reason for hiding this comment

Uh oh!

lifuhuang commented Oct 6, 2025

Uh oh!

ConnorLi96 Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ConnorLi96 commented Oct 6, 2025 •

edited

Loading