refactor: make QuantizationMode an algebraic enum with associated values (#285) by VDurocher · Pull Request #393 · ml-explore/mlx-swift

VDurocher · 2026-04-07T13:07:46Z

Closes #285

What

Converts QuantizationMode from a scalar enum (backed by String raw values) to an algebraic enum, allowing the affine case to carry its own configuration parameters.

Why

Different quantization schemes have fundamentally different parameters. Previously, groupSize and bits were passed as separate top-level arguments alongside the mode, creating an implicit coupling: callers had to know which parameters applied to which mode. With associated values, the configuration is explicit, exhaustively type-checked, and co-located with the mode it configures.

As noted in the issue, mxfp4 only allows groupSize = 32 and bits = 4 — encoding these as separate top-level properties forces callers to reason about which combinations are valid. Associated values eliminate that ambiguity.

Changes

`Source/MLX/Ops.swift`

QuantizationMode cases now:
- case affine(groupSize: Int = 64, bits: Int = 4) — carries its parameters with sensible defaults
- case mxfp4, case mxfp8, case nvfp4 — no associated values (parameters are fixed by format spec)
Replaced : String, Codable, Sendable with : Equatable, Sendable (associated values are incompatible with String raw type)
Added var cName: String computed property to bridge to the C API (replaces .rawValue)
Added public var groupSize: Int and public var bits: Int computed properties — these return the associated values for .affine and the spec-mandated values for other modes
Added extension QuantizationMode: Codable with manual encode/init(from:) implementations
All mode.rawValue call sites updated to mode.cName (7 occurrences)
All mode: QuantizationMode = .affine default parameters updated to .affine() (6 occurrences)

`Source/MLXNN/Quantized.swift`

All mode: QuantizationMode = .affine default parameters updated to .affine() (12 occurrences across quantizeSingle, quantize, QuantizedEmbedding, and QuantizedLinear)

Migration

// Before
let mode = QuantizationMode.affine
quantize(model: model, groupSize: 64, bits: 4, mode: .affine)

// After
let mode = QuantizationMode.affine()           // uses defaults: groupSize: 64, bits: 4
let mode = QuantizationMode.affine(groupSize: 32, bits: 8)  // custom parameters
quantize(model: model, groupSize: 64, bits: 4, mode: .affine())

The existing groupSize and bits top-level parameters on public functions are preserved for backward compatibility. The QuantizationMode.groupSize and QuantizationMode.bits computed properties provide a unified way to query a mode's effective parameters.

Notes

This is a source-breaking change for all use sites of QuantizationMode.affine (bare), as noted in the issue
The Codable encoding format changes from a plain string ("affine") to a keyed container ({"type":"affine","groupSize":64,"bits":4}) — existing serialized data using the old format would need migration
Equatable conformance is auto-synthesized by Swift for enums with Equatable-conforming associated values

…ues (ml-explore#285) - Convert QuantizationMode from String raw value enum to algebraic enum - Add case affine(groupSize: Int = 64, bits: Int = 4) with associated values - Keep mxfp4, mxfp8, nvfp4 as simple cases (fixed parameters) - Add cName computed property to replace rawValue for C API calls - Add groupSize and bits computed properties on QuantizationMode - Add manual Codable conformance (required since associated values prevent String raw type) - Add Equatable conformance (auto-synthesized by Swift) - Update all call sites: mode.rawValue → mode.cName - Update all default parameter values: .affine → .affine() - Update Quantized.swift with matching .affine() defaults

Update all default parameter values from .affine to .affine() to match the new algebraic QuantizationMode enum where affine carries associated values (groupSize and bits).

- Convert QuantizationMode from String raw value enum to algebraic enum - Add case affine(groupSize: Int = 64, bits: Int = 4) with associated values - Keep mxfp4, mxfp8, nvfp4 as simple cases with fixed parameters - Add cName computed property to replace rawValue for C API calls - Add groupSize and bits computed properties on QuantizationMode - Add manual Codable conformance (associated values prevent String raw type) - Add Equatable conformance (auto-synthesized by Swift) - Update all call sites: mode.rawValue -> mode.cName - Update all default parameter values: .affine -> .affine() Closes ml-explore#285

VDurocher added 3 commits April 7, 2026 15:03

refactor: update Quantized.swift to use algebraic QuantizationMode

de7d739

Update all default parameter values from .affine to .affine() to match the new algebraic QuantizationMode enum where affine carries associated values (groupSize and bits).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: make QuantizationMode an algebraic enum with associated values (#285)#393

refactor: make QuantizationMode an algebraic enum with associated values (#285)#393
VDurocher wants to merge 3 commits intoml-explore:mainfrom
VDurocher:feat/algebraic-quantization-mode

VDurocher commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

VDurocher commented Apr 7, 2026

What

Why

Changes

Source/MLX/Ops.swift

Source/MLXNN/Quantized.swift

Migration

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`Source/MLX/Ops.swift`

`Source/MLXNN/Quantized.swift`