Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
c00fb01
refactor_multi_step_attn_mask_for_arbitrary_step
yeyu-nvidia Sep 19, 2025
1de7d0a
minor
yeyu-nvidia Sep 19, 2025
e6ec481
make new mask the same dtype and device as attn_mask
yeyu-nvidia Sep 19, 2025
89b136d
minor
yeyu-nvidia Sep 22, 2025
fd8b455
revert
yeyu-nvidia Sep 22, 2025
d3bd039
integrate parallel draft to eagle auto regression
yeyu-nvidia Sep 23, 2025
15840d6
debug
yeyu-nvidia Sep 23, 2025
af34721
fix pseudo spec generate for parallel draft
yeyu-nvidia Sep 23, 2025
0e57805
implement kv cache (inference_context) for eagle training
yeyu-nvidia Sep 24, 2025
7eea4e2
allow groundtruth mismatch in AcceptanceRateValidation
yeyu-nvidia Sep 25, 2025
e934d62
set fail_when_mismatch to False
yeyu-nvidia Sep 25, 2025
9b98412
debug
yeyu-nvidia Sep 26, 2025
8ee7fe4
debug
yeyu-nvidia Sep 26, 2025
2e09a73
debug
yeyu-nvidia Sep 26, 2025
9162d00
debug
yeyu-nvidia Sep 26, 2025
9ef5fe9
remove redundant index as now eagle_logits_ has a length of seq-len
yeyu-nvidia Sep 29, 2025
5e11b81
minor edit based on coderabbit's suggestions
yeyu-nvidia Sep 29, 2025
f9e37fe
make ttt_step configurable in forward
yeyu-nvidia Oct 1, 2025
b582c94
fix the bug in pseudo_speculative_generate
yeyu-nvidia Oct 1, 2025
66ce87a
change variable name to make it clear
yeyu-nvidia Oct 1, 2025
691d22e
debug: reduce kv cache size from ttt*parallel to ttt+parallel-1; in e…
yeyu-nvidia Oct 1, 2025
8f4ec81
gitnore type for precommit
yeyu-nvidia Oct 1, 2025
a169f37
consolidate acc printout
yeyu-nvidia Oct 1, 2025
8dcb48d
fix the bug in pseudo_speculative_generate
yeyu-nvidia Oct 2, 2025
31c2f45
use embedding for mask tokens as hidden_states
yeyu-nvidia Oct 3, 2025
ee28311
debug
yeyu-nvidia Oct 3, 2025
ab94f5b
remove mask tokens and use learnable embeddings and hidden_states ins…
yeyu-nvidia Oct 6, 2025
f03935d
debug
yeyu-nvidia Oct 6, 2025
34dd050
debug
yeyu-nvidia Oct 6, 2025
cbf0285
debug
yeyu-nvidia Oct 6, 2025
3a95869
debug: move learnable parallel draft embedding and hidden_states afte…
yeyu-nvidia Oct 9, 2025
75cd1aa
debug
yeyu-nvidia Oct 10, 2025
095a302
only compute acc on tokens whose label is not IGNORE_TOKEN_ID
yeyu-nvidia Oct 17, 2025
f0de5d1
revert acc compute
yeyu-nvidia Oct 17, 2025
64e2e75
implement eagle + medusa for parallel draft
yeyu-nvidia Oct 20, 2025
0be2c23
debug
yeyu-nvidia Oct 20, 2025
4b52efd
debug
yeyu-nvidia Oct 20, 2025
ede621e
debug: medusa labels should be shifted
yeyu-nvidia Oct 20, 2025
10b8881
loss_decay only on ttt
yeyu-nvidia Oct 20, 2025
c42f87a
revert loss_decay; fix: medusa logit shift in the back
yeyu-nvidia Oct 20, 2025
0f9483f
detach eagle hidden_states from medusa heads; this will make parallel…
yeyu-nvidia Oct 28, 2025
aa62ef5
no loss decay on parallel index
yeyu-nvidia Oct 30, 2025
770f28a
debug: parallel logits should be cat in dim=0 in pseudo gen
yeyu-nvidia Nov 3, 2025
823962b
debug: scatter padded hidden_states in pseudo generate
yeyu-nvidia Nov 4, 2025
8c8439a
rename layer names and hnorm for megatron eagle3 export to be consist…
yeyu-nvidia Nov 6, 2025
942a0d6
no detach for medusa feature
yeyu-nvidia Nov 6, 2025
964b23d
change parallel medusa to parallel eagle
yeyu-nvidia Nov 6, 2025
15e1079
debug
yeyu-nvidia Nov 6, 2025
09eafd0
debug
yeyu-nvidia Nov 6, 2025
5e9fd53
debug
yeyu-nvidia Nov 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions modelopt/torch/export/plugins/mcore_llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,15 +117,15 @@

eagle3_deep_llama_causal_lm_export: dict[str, CustomModuleMapping] = {
"word_embeddings": NameRemapping("embed_tokens."),
"enorm": NameRemapping("layers.0.input_layernorm."),
"enorm": NameRemapping("midlayer.0.input_layernorm."),
"fc": NameRemapping("fc."),
"first_input_layernorm": NameRemapping("layers.0.hidden_norm."),
"input_layernorm": NameRemapping("layers.{}.input_layernorm."),
"linear_qkv": QKVSlicing("layers.{}.self_attn."),
"linear_proj": NameRemapping("layers.{}.self_attn.o_proj."),
"pre_mlp_layernorm": NameRemapping("layers.{}.post_attention_layernorm."),
"linear_fc1": GatedMLPSlicing("layers.{}.mlp."),
"linear_fc2": NameRemapping("layers.{}.mlp.down_proj."),
"first_input_layernorm": NameRemapping("midlayer.0.hidden_norm."),
"input_layernorm": NameRemapping("midlayer.{}.hidden_norm."),
"linear_qkv": QKVSlicing("midlayer.{}.self_attn."),
"linear_proj": NameRemapping("midlayer.{}.self_attn.o_proj."),
"pre_mlp_layernorm": NameRemapping("midlayer.{}.post_attention_layernorm."),
"linear_fc1": GatedMLPSlicing("midlayer.{}.mlp."),
"linear_fc2": NameRemapping("midlayer.{}.mlp.down_proj."),
"final_layernorm": NameRemapping("norm."),
"d2t": NameRemapping("d2t"),
"output_layer": NameRemapping("lm_head."),
Expand Down
Loading