## Add Quantization Error Propagation (QEP) support to Qwix#217
Closed
copybara-service[bot] wants to merge 0 commit intomainfrom
Closed
## Add Quantization Error Propagation (QEP) support to Qwix#217copybara-service[bot] wants to merge 0 commit intomainfrom
copybara-service[bot] wants to merge 0 commit intomainfrom
Conversation
bf19872 to
ed57104
Compare
ed57104 to
658840f
Compare
f017668 to
4c65d66
Compare
1258d5a to
e6d1c85
Compare
e6d1c85 to
3928dfd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Quantization Error Propagation (QEP) support to Qwix
QEP extends standard GPTQ by compensating for cascading quantization noise
introduced by preceding layers during inference. While GPTQ minimizes
||W @ X - W_q @ X||^2assuming perfect float inputs, QEP actively minimizes||W @ X_float - W_q @ X_q||^2.This is achieved by computing an input cross-correlation statistic (
H_delta)and applying a localized weight correction (
W_corrected = W + alpha * (W @ H_delta @ inv(H)))prior to standard GPTQ rounding.
API & Usage
The primary entry point is
qep.quantize(...). Because QEP must measure theaccumulated error from previously quantized layers, it orchestrates a multi-pass
calibration loop stage-by-stage rather than relying on a single forward pass.
For offline or distributed pipelines where statistics are pre-computed remotely,
qep.quantize_params()can be directly invoked to apply the QEP correctionand GPTQ rounding to float weights without re-running the model graph.
Key modifications
qep_core.py: Pure-JAX algorithms for QEP statistics (compute_qep_stats) and the core weight shifting logic (weight_correct).qep.py: The stagewise orchestrator (qep.quantize). Dynamically discovers interconnected topological stages, applies a two-pass (float vs. quantized) calibration loop per batch, and updates weights progressively through the network.calibration.py: Refactored the coreCalibrationProvidermechanics to decouple single-pass logic, enabling robust multi-pass activation interception for QEP.QepRule: New configuration struct extendingGptqRulewith hyperparameter tuning (correction_factor,damping_factor).