Skip to content

[bug] Possible Conversion issue with Hybrid Mamba 2 #380

@jlamypoirier

Description

@jlamypoirier

🐞 Describe the Bug

tests/models/test_checkpoint.py::test_huggingface_model[hybrid_mamba_2] fails randomly with the following:

Native Huggingface
>>>> [Native Huggingface] Excessive diff for tensor logits:
  * RMS diff absolute = 0.3538866937160492 > 0.0005
  * RMS diff scaled = 0.9999999403953552 > 0.003 (scale=0.3538867235183716, unregularized=0.3538866937160492)
  * Max diff absolute = 1.5726423263549805 > 0.005
  * Max diff scaled = 4.443914413452148 > 0.015 (scale=0.3538867235183716, unregularized=0.3538866937160492)
  Test samples:   0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00  0.0000e+00
  Ref samples:    2.0669e-01 -1.6513e-01  2.7235e-02  4.8058e-01 -4.1691e-01  3.4132e-01  5.5470e-01 -3.1563e-01  1.6231e-01  4.1969e-01
----------------------------------------------- Captured stderr call ------------------------------------------------
Some weights of AprielHybridSSMForCausalLM were not initialized from the model checkpoint at /tmp/fast_llm_tests/models/hybrid_mamba_2/convert_model/apriel_hybrid_ssm_from_distributed and are newly initialized: ['model.layers.1.mixer.A_log', 'model.layers.1.mixer.D', 'model.layers.1.mixer.conv1d.weight', 'model.layers.1.mixer.dt_in_proj.weight', 'model.layers.1.mixer.dt_proj.bias', 'model.layers.1.mixer.dt_proj.weight', 'model.layers.1.mixer.in_proj.weight', 'model.layers.1.mixer.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Seems like the SSM layer is not converted properly. But for some inexplicable reason this doesn't always happen, and most of the time the test pass. More investigation needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CriticalbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions