Skip to content

Releases: wangzhaode/llm-export

llmexport v0.0.3

03 Sep 07:03

Choose a tag to compare

Release Notes - v0.0.3

πŸŽ‰ Major Updates

This release represents a significant milestone with comprehensive architecture improvements and extensive new model support. The codebase has been completely restructured and synchronized with the latest MNN framework.

πŸš€ New Features

Model Support

  • βœ… SmolLM Series: Added support for SmolLM models with optimized configurations
  • βœ… MobileLLM Series: Enhanced support for mobile-optimized language models
  • βœ… BGE Models: Added support for bge-small embedding models
  • βœ… OpenELM: Support for Apple's OpenELM model series

Quantization Enhancements

  • πŸ”₯ AWQ Quantization: Full implementation of AWQ (Activation-aware Weight Quantization)
  • πŸ”₯ Symmetric Quantization: Added symmetric quantization support for improved performance
  • πŸ”₯ Mixed Quantization: New mixed quantization strategies for optimal model compression
  • πŸ”₯ HQQ Quantization: Half-Quadratic Quantization support added

Architecture Improvements

  • πŸ“ Modular Utils: Complete reorganization with dedicated utility modules:
    • Audio processing utilities (audio.py)
    • Vision model handling (vision.py)
    • GGUF file support (gguf/)
    • Advanced quantization modules
    • MNN conversion utilities
    • ONNX optimization tools

Enhanced Capabilities

  • 🎡 Audio Models: Added support for audio-enabled models (Qwen2-Audio, etc.)
  • πŸ‘οΈ Vision Models: Enhanced vision model support with specialized processing
  • πŸ”§ LoRA Integration: Improved LoRA weight handling and merging
  • 🎯 Model Mapping: Advanced model architecture mapping system

πŸ› Bug Fixes

  • Embedding Loading: Fixed critical embedding loading issues
  • ONNX Dynamic Axis: Resolved dynamic axis configuration problems
  • Linear Layer Bias: Fixed duplicate naming issues in ONNX export for Linear and bias operations
  • Model Compatibility: Enhanced compatibility across different model architectures

πŸ“š Documentation Updates

  • README Optimization: Completely restructured README with professional badges, clear installation guides, and comprehensive feature documentation
  • Model Downloads: Added extensive model download links for both ModelScope and Hugging Face
  • Popular Models: Updated with latest high-demand models including:
    • DeepSeek-R1-1.5B-Qwen
    • Qwen2.5 series (0.5B, 1.5B)
    • GPT-OSS-20B
    • Qwen3-4B-Instruct-2507

πŸ”§ Technical Improvements

  • Code Restructuring: Major refactoring with 10,297 lines added and modular architecture
  • Performance Optimization: Enhanced inference speed and memory efficiency
  • Cross-platform Support: Improved compatibility across different deployment platforms
  • Error Handling: Better error reporting and debugging capabilities

πŸ“¦ Installation & Usage

# Install latest version
pip install llmexport==0.0.3

# Quick export example
llmexport --path Qwen2.5-1.5B-Instruct --export mnn --quant_bit 4

πŸ”— Related Projects

⚠️ Breaking Changes

This version includes significant architectural changes. Please review the updated documentation and examples when upgrading from previous versions.

πŸ™ Acknowledgments

Special thanks to all contributors and the MNN team for their continuous support and collaboration in making this release possible.


Full Changelog: v0.0.2...v0.0.3

llmexport v0.0.2

27 Sep 06:28

Choose a tag to compare

Features

  • Added support for Qwen2-VL.
  • Introduced support for GTE and split embedding layers for BGE/GTE.
  • Implemented imitate_quant functionality during testing.
  • Enabled usage of C++ compiled MNNConvert.

Refactors

  • Refactored the implementation of the VL model.
  • Updated model path handling for ONNX models.

Bug Fixes

  • Resolved issues with stop_ids and quantization.
  • Fixed the bug related to block_size = 0.

llmexport v0.0.1

19 Aug 09:31

Choose a tag to compare

  • Support export onnx/ mnn from pretrain model.
  • Using FakeLinear to save memory and time when export onnx and mnn.
  • Support onnxslim to optimize onnx graph.