Skip to content

Conversation

ConnorLi96
Copy link
Contributor

@ConnorLi96 ConnorLi96 commented Oct 6, 2025

Motivation

  1. LoRA initialization code expects config.num_hidden_layers and config.hidden_size at the top level of model config
  2. For Gemma3ForConditionalGeneration, these attributes are nested under config.text_config (not at top level)
  3. This caused: AssertionError: LoRA buffer shape torch.Size([4096, 32]) does not match weight shape torch.Size([1728, 32])
  4. Vision/projector modules were being incorrectly included in LoRA initialization

Modifications

  1. gemma3_mm.py - Add LoRA support with config attribute exposure
  2. mllama4.py - Add LoRA support (follows same pattern)
  3. lora_manager.py - Remove generic vision skipping logic

Accuracy Tests

N/A

Benchmarking and Profiling

N/A

Checklist

Copy link
Contributor

Summary of Changes

Hello @ConnorLi96, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves issues preventing LoRA (Low-Rank Adaptation) from functioning correctly with multimodal models, specifically Gemma3ForConditionalGeneration. It ensures that LoRA can properly initialize by making necessary configuration attributes accessible at the top level and by explicitly excluding non-language model components from LoRA application, thereby preventing buffer shape mismatches.

Highlights

  • LoRA Compatibility for Multimodal Models: Exposed num_hidden_layers and hidden_size from text_config to the top-level config in Gemma3ForConditionalGeneration to resolve LoRA initialization failures in multimodal models.
  • Enhanced Vision Component Skipping for LoRA: Updated should_skip_lora_for_vision_model to identify and skip a wider range of vision and audio components (e.g., vision_model, vision_tower, multi_modal_projector) when applying LoRA, ensuring it's only applied to language model layers.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix LoRA support for multimodal models, specifically Gemma3. The changes involve patching the model configuration in gemma3_mm.py to be compatible with the LoRA manager's expectations and generalizing the logic in lora_manager.py to skip vision components.

My review identifies a potential bug in the updated skipping logic in lora_manager.py and suggests a more robust implementation. Additionally, while the config patching in gemma3_mm.py provides a functional fix, I've noted it as a short-term solution and recommend a more scalable, architectural improvement in the LoRAManager for better long-term maintainability.

@zhyncs zhyncs added the run-ci label Oct 6, 2025
@zhyncs zhyncs self-assigned this Oct 6, 2025
Copy link
Collaborator

@Fridge003 Fridge003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lifuhuang
Copy link
Collaborator

Thank you @ConnorLi96 for the contribution!
Let's discuss here: https://sgl-fru7574.slack.com/archives/C09JDPAP3FA/p1759738289851509

@ConnorLi96 ConnorLi96 changed the title fix lora gemma 3 support in text Fix LoRA support for multimodal models (VLMs) by implementing a consistent pattern for skipping vision components Oct 6, 2025
if not hasattr(config, "num_hidden_layers"):
config.num_hidden_layers = config.text_config.num_hidden_layers
if not hasattr(config, "hidden_size"):
config.hidden_size = config.text_config.hidden_size
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only Gemma3 needs config attribute exposure:

  • Gemma3ForConditionalGeneration uses a config structure where language model attributes (num_hidden_layers, hidden_size) are exclusively in config.text_config, with no top-level copies
  • Other VLMs like Phi4ForConditionalGeneration and Llama4ForConditionalGeneration either:
    • Already have these attributes at the top level, OR
    • Have different config inheritance that makes them accessible
  • LoRA's LoRAMemoryPool.__init__ directly accesses base_hf_config.num_hidden_layers (line 59 in mem_pool.py), which fails for Gemma3's nested-only structure

@zhyncs zhyncs merged commit afc35cc into sgl-project:main Oct 7, 2025
57 of 64 checks passed
PrinsYin pushed a commit to PrinsYin/sglang that referenced this pull request Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants