Help with porting Gemma 3n to Flax NNX #4867

jrabary · 2025-08-03T20:10:58Z

jrabary
Aug 3, 2025

Hi all,

I’m porting Gemma 3n to Flax NNX and could use some guidance on the vision + text path.

What I’ve done so far:
• Adapted the text‑only Linen version from the Gemma repo to NNX.
• Implemented MobileNet v5 for vision, taking cues from Transformers and vlm-mlx.
• Load text checkpoints from the Gemma 3n repo and vision checkpoints from the Transformers version.
• I can forward through the model; text‑only generation looks good.

Where I’m stuck:
• Vision + text generation isn’t working. I’m likely mishandling the mask, specifically _create_sliding_mask_for_gemma_3n. I’m not confident I’ve reproduced the intended behavior.

Could anyone share high‑level hints or pointers to the intended masking/fusion behavior for the multimodal path (especially how the sliding mask should treat vision tokens vs. text)? Any insight from Gemma 3n devs or folks who have done a similar port would be super helpful. Happy to share code snippets if useful.

Jao

Thanks!

vfdev-5 · 2025-09-02T13:21:12Z

vfdev-5
Sep 2, 2025
Maintainer

Maybe MaxText codebase can be useful for porting the model: https://github.com/AI-Hypercomputer/maxtext/blob/7070e8eecbea8951c8e5281219ce797c8df1441f/MaxText/layers/gemma3.py#L15-L16

While Gemma3n is still a feature request in the MaxText repository, you can check Gemma 3 vision encoder and text encoder fusion code which may help:

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Help with porting Gemma 3n to Flax NNX #4867

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Help with porting Gemma 3n to Flax NNX #4867

Uh oh!

jrabary Aug 3, 2025

Replies: 1 comment

Uh oh!

vfdev-5 Sep 2, 2025 Maintainer

jrabary
Aug 3, 2025

vfdev-5
Sep 2, 2025
Maintainer