-
Notifications
You must be signed in to change notification settings - Fork 10
Converter architecture implementations and refactor #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
markurtz
wants to merge
15
commits into
main
Choose a base branch
from
feat/converter-architecture
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+4,432
−1,449
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
📦 Build Artifacts Available |
545b7f5
to
9a3afc3
Compare
- Implement EagleConverter class for converting Eagle/HASS checkpoints - Support standard Eagle and layernorms variants - Map weights correctly (fc→fusion_fc, layers.0→transformer) - Skip embed_tokens.weight due to weight tying - Add comprehensive unit and e2e tests
Major refactoring to improve code organization and reusability: - Extract 6 generic utility functions to utils.py: * download_checkpoint_from_hub * ensure_checkpoint_is_local * load_checkpoint_config * load_checkpoint_weights * detect_fusion_bias_and_layernorms (renamed for clarity) * save_speculator_checkpoint (uses save_pretrained) - Keep Eagle-specific logic in EagleConverter class: * Weight name remapping * Config translation * Architecture validation - Split weight processing into two functions: * _should_skip_weight: Determines if weight should be skipped * _remap_weight_name: Handles the actual name remapping - Move SpeculatorModelConfig import to module level - Add comprehensive RST docstrings with usage examples - Update tests to use new utils module This separation enables reuse of generic utilities for future speculator implementations (Medusa, Hydra, etc.) while keeping architecture-specific logic isolated. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Get vocab_size, hidden_size, and max_position_embeddings from model config - Use conservative sequence length that respects model's max_position_embeddings - Improve exception variable naming for clarity - Add more detailed logging of forward pass parameters 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Initialize EagleSpeculator with verifier_attachment_mode='detached' to prevent verifier loading - Use model.load_state_dict with strict=False to load only Eagle-specific weights - Let model.save_pretrained handle saving config, weights, and auto-generated code - Update test to check for eagle.py instead of mocking save_file - Remove unused save_speculator_checkpoint function from utils This approach leverages the model's native save method while avoiding verifier dependency issues during conversion. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Fixed meta device issue by loading converted weights before saving - Applied style fixes (simplified return, fixed line lengths) - Moved e2e conversion tests to tests/e2e/convert/ - Added comprehensive unit tests for Eagle converter utilities The missing model.load_state_dict(weights, strict=False) call was causing converted checkpoints to save without weights, resulting in models loading on meta device. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…ndard Python CLI expectations
…re styling is passing. Migration of tests pending
9a3afc3
to
389dc2b
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.