Releases: wangzhaode/llm-export
Releases Β· wangzhaode/llm-export
llmexport v0.0.3
Release Notes - v0.0.3
π Major Updates
This release represents a significant milestone with comprehensive architecture improvements and extensive new model support. The codebase has been completely restructured and synchronized with the latest MNN framework.
π New Features
Model Support
- β SmolLM Series: Added support for SmolLM models with optimized configurations
- β MobileLLM Series: Enhanced support for mobile-optimized language models
- β BGE Models: Added support for bge-small embedding models
- β OpenELM: Support for Apple's OpenELM model series
Quantization Enhancements
- π₯ AWQ Quantization: Full implementation of AWQ (Activation-aware Weight Quantization)
- π₯ Symmetric Quantization: Added symmetric quantization support for improved performance
- π₯ Mixed Quantization: New mixed quantization strategies for optimal model compression
- π₯ HQQ Quantization: Half-Quadratic Quantization support added
Architecture Improvements
- π Modular Utils: Complete reorganization with dedicated utility modules:
- Audio processing utilities (audio.py)
- Vision model handling (vision.py)
- GGUF file support (
gguf/
) - Advanced quantization modules
- MNN conversion utilities
- ONNX optimization tools
Enhanced Capabilities
- π΅ Audio Models: Added support for audio-enabled models (Qwen2-Audio, etc.)
- ποΈ Vision Models: Enhanced vision model support with specialized processing
- π§ LoRA Integration: Improved LoRA weight handling and merging
- π― Model Mapping: Advanced model architecture mapping system
π Bug Fixes
- Embedding Loading: Fixed critical embedding loading issues
- ONNX Dynamic Axis: Resolved dynamic axis configuration problems
- Linear Layer Bias: Fixed duplicate naming issues in ONNX export for Linear and bias operations
- Model Compatibility: Enhanced compatibility across different model architectures
π Documentation Updates
- README Optimization: Completely restructured README with professional badges, clear installation guides, and comprehensive feature documentation
- Model Downloads: Added extensive model download links for both ModelScope and Hugging Face
- Popular Models: Updated with latest high-demand models including:
- DeepSeek-R1-1.5B-Qwen
- Qwen2.5 series (0.5B, 1.5B)
- GPT-OSS-20B
- Qwen3-4B-Instruct-2507
π§ Technical Improvements
- Code Restructuring: Major refactoring with 10,297 lines added and modular architecture
- Performance Optimization: Enhanced inference speed and memory efficiency
- Cross-platform Support: Improved compatibility across different deployment platforms
- Error Handling: Better error reporting and debugging capabilities
π¦ Installation & Usage
# Install latest version
pip install llmexport==0.0.3
# Quick export example
llmexport --path Qwen2.5-1.5B-Instruct --export mnn --quant_bit 4
π Related Projects
β οΈ Breaking Changes
This version includes significant architectural changes. Please review the updated documentation and examples when upgrading from previous versions.
π Acknowledgments
Special thanks to all contributors and the MNN team for their continuous support and collaboration in making this release possible.
Full Changelog: v0.0.2...v0.0.3
llmexport v0.0.2
Features
- Added support for Qwen2-VL.
- Introduced support for GTE and split embedding layers for BGE/GTE.
- Implemented
imitate_quant
functionality during testing. - Enabled usage of C++ compiled MNNConvert.
Refactors
- Refactored the implementation of the VL model.
- Updated model path handling for ONNX models.
Bug Fixes
- Resolved issues with
stop_ids
and quantization. - Fixed the bug related to
block_size = 0
.
llmexport v0.0.1
- Support export onnx/ mnn from pretrain model.
- Using FakeLinear to save memory and time when export onnx and mnn.
- Support
onnxslim
to optimize onnx graph.