How to use aqt int8 quantization with flax.nnx #4257

lkwq007 · 2024-10-05T13:47:15Z

lkwq007
Oct 5, 2024

I was wondering if there are any examples demonstrating how to use aqt with flax.nnx, specifically for quantizing a pretrained model into a quantized version. Using nnx.bridge.ToLinen to wrap the model and then doing the quantization seems to be a feasible approach, but is there a better way to achieve this? Thanks.

cgarciae · 2024-10-06T14:00:16Z

cgarciae
Oct 6, 2024
Maintainer

Hey @lkwq007, we still haven't ported any of the quantization utilities. Quantization APIs might be different in NNX to leverage the the ability to do model surgery. I'd imagine having a proper LinearFP8 that you can use to monkey patch Linear layers in an existing model.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use aqt int8 quantization with flax.nnx #4257

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to use aqt int8 quantization with flax.nnx #4257

Uh oh!

lkwq007 Oct 5, 2024

Replies: 1 comment

Uh oh!

cgarciae Oct 6, 2024 Maintainer

lkwq007
Oct 5, 2024

cgarciae
Oct 6, 2024
Maintainer