-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Labels
Description
🐞 Describe the Bug
tests/models/test_checkpoint.py::test_huggingface_model[hybrid_mamba_2] fails randomly with the following:
Native Huggingface
>>>> [Native Huggingface] Excessive diff for tensor logits:
* RMS diff absolute = 0.3538866937160492 > 0.0005
* RMS diff scaled = 0.9999999403953552 > 0.003 (scale=0.3538867235183716, unregularized=0.3538866937160492)
* Max diff absolute = 1.5726423263549805 > 0.005
* Max diff scaled = 4.443914413452148 > 0.015 (scale=0.3538867235183716, unregularized=0.3538866937160492)
Test samples: 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
Ref samples: 2.0669e-01 -1.6513e-01 2.7235e-02 4.8058e-01 -4.1691e-01 3.4132e-01 5.5470e-01 -3.1563e-01 1.6231e-01 4.1969e-01
----------------------------------------------- Captured stderr call ------------------------------------------------
Some weights of AprielHybridSSMForCausalLM were not initialized from the model checkpoint at /tmp/fast_llm_tests/models/hybrid_mamba_2/convert_model/apriel_hybrid_ssm_from_distributed and are newly initialized: ['model.layers.1.mixer.A_log', 'model.layers.1.mixer.D', 'model.layers.1.mixer.conv1d.weight', 'model.layers.1.mixer.dt_in_proj.weight', 'model.layers.1.mixer.dt_proj.bias', 'model.layers.1.mixer.dt_proj.weight', 'model.layers.1.mixer.in_proj.weight', 'model.layers.1.mixer.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Seems like the SSM layer is not converted properly. But for some inexplicable reason this doesn't always happen, and most of the time the test pass. More investigation needed.