[bug] Possible Conversion issue with Hybrid Mamba 2

# 🐞 Describe the Bug

`tests/models/test_checkpoint.py::test_huggingface_model[hybrid_mamba_2]` fails randomly with the following:

```
Native Huggingface
>>>> [Native Huggingface] Excessive diff for tensor logits:
  * RMS diff absolute = 0.3538866937160492 > 0.0005
  * RMS diff scaled = 0.9999999403953552 > 0.003 (scale=0.3538867235183716, unregularized=0.3538866937160492)
  * Max diff absolute = 1.5726423263549805 > 0.005
  * Max diff scaled = 4.443914413452148 > 0.015 (scale=0.3538867235183716, unregularized=0.3538866937160492)
  Test samples:   0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00
  Ref samples:    2.0669e-01 -1.6513e-01  2.7235e-02  4.8058e-01 -4.1691e-01  3.4132e-01  5.5470e-01 -3.1563e-01  1.6231e-01  4.1969e-01
----------------------------------------------- Captured stderr call ------------------------------------------------
Some weights of AprielHybridSSMForCausalLM were not initialized from the model checkpoint at /tmp/fast_llm_tests/models/hybrid_mamba_2/convert_model/apriel_hybrid_ssm_from_distributed and are newly initialized: ['model.layers.1.mixer.A_log', 'model.layers.1.mixer.D', 'model.layers.1.mixer.conv1d.weight', 'model.layers.1.mixer.dt_in_proj.weight', 'model.layers.1.mixer.dt_proj.bias', 'model.layers.1.mixer.dt_proj.weight', 'model.layers.1.mixer.in_proj.weight', 'model.layers.1.mixer.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
```

Seems like the SSM layer is not converted properly. But for some inexplicable reason this doesn't always happen, and most of the time the test pass. More investigation needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug] Possible Conversion issue with Hybrid Mamba 2 #380

🐞 Describe the Bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] Possible Conversion issue with Hybrid Mamba 2 #380

Description

🐞 Describe the Bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions