Skip to content

refactor: make QuantizationMode an algebraic enum with associated values (#285)#393

Open
VDurocher wants to merge 3 commits intoml-explore:mainfrom
VDurocher:feat/algebraic-quantization-mode
Open

refactor: make QuantizationMode an algebraic enum with associated values (#285)#393
VDurocher wants to merge 3 commits intoml-explore:mainfrom
VDurocher:feat/algebraic-quantization-mode

Conversation

@VDurocher
Copy link
Copy Markdown

Closes #285

What

Converts QuantizationMode from a scalar enum (backed by String raw values) to an algebraic enum, allowing the affine case to carry its own configuration parameters.

Why

Different quantization schemes have fundamentally different parameters. Previously, groupSize and bits were passed as separate top-level arguments alongside the mode, creating an implicit coupling: callers had to know which parameters applied to which mode. With associated values, the configuration is explicit, exhaustively type-checked, and co-located with the mode it configures.

As noted in the issue, mxfp4 only allows groupSize = 32 and bits = 4 — encoding these as separate top-level properties forces callers to reason about which combinations are valid. Associated values eliminate that ambiguity.

Changes

Source/MLX/Ops.swift

  • QuantizationMode cases now:
    • case affine(groupSize: Int = 64, bits: Int = 4) — carries its parameters with sensible defaults
    • case mxfp4, case mxfp8, case nvfp4 — no associated values (parameters are fixed by format spec)
  • Replaced : String, Codable, Sendable with : Equatable, Sendable (associated values are incompatible with String raw type)
  • Added var cName: String computed property to bridge to the C API (replaces .rawValue)
  • Added public var groupSize: Int and public var bits: Int computed properties — these return the associated values for .affine and the spec-mandated values for other modes
  • Added extension QuantizationMode: Codable with manual encode/init(from:) implementations
  • All mode.rawValue call sites updated to mode.cName (7 occurrences)
  • All mode: QuantizationMode = .affine default parameters updated to .affine() (6 occurrences)

Source/MLXNN/Quantized.swift

  • All mode: QuantizationMode = .affine default parameters updated to .affine() (12 occurrences across quantizeSingle, quantize, QuantizedEmbedding, and QuantizedLinear)

Migration

// Before
let mode = QuantizationMode.affine
quantize(model: model, groupSize: 64, bits: 4, mode: .affine)

// After
let mode = QuantizationMode.affine()           // uses defaults: groupSize: 64, bits: 4
let mode = QuantizationMode.affine(groupSize: 32, bits: 8)  // custom parameters
quantize(model: model, groupSize: 64, bits: 4, mode: .affine())

The existing groupSize and bits top-level parameters on public functions are preserved for backward compatibility. The QuantizationMode.groupSize and QuantizationMode.bits computed properties provide a unified way to query a mode's effective parameters.

Notes

  • This is a source-breaking change for all use sites of QuantizationMode.affine (bare), as noted in the issue
  • The Codable encoding format changes from a plain string ("affine") to a keyed container ({"type":"affine","groupSize":64,"bits":4}) — existing serialized data using the old format would need migration
  • Equatable conformance is auto-synthesized by Swift for enums with Equatable-conforming associated values

…ues (ml-explore#285)

- Convert QuantizationMode from String raw value enum to algebraic enum
- Add case affine(groupSize: Int = 64, bits: Int = 4) with associated values
- Keep mxfp4, mxfp8, nvfp4 as simple cases (fixed parameters)
- Add cName computed property to replace rawValue for C API calls
- Add groupSize and bits computed properties on QuantizationMode
- Add manual Codable conformance (required since associated values prevent String raw type)
- Add Equatable conformance (auto-synthesized by Swift)
- Update all call sites: mode.rawValue → mode.cName
- Update all default parameter values: .affine → .affine()
- Update Quantized.swift with matching .affine() defaults
Update all default parameter values from .affine to .affine() to match
the new algebraic QuantizationMode enum where affine carries associated
values (groupSize and bits).
- Convert QuantizationMode from String raw value enum to algebraic enum
- Add case affine(groupSize: Int = 64, bits: Int = 4) with associated values
- Keep mxfp4, mxfp8, nvfp4 as simple cases with fixed parameters
- Add cName computed property to replace rawValue for C API calls
- Add groupSize and bits computed properties on QuantizationMode
- Add manual Codable conformance (associated values prevent String raw type)
- Add Equatable conformance (auto-synthesized by Swift)
- Update all call sites: mode.rawValue -> mode.cName
- Update all default parameter values: .affine -> .affine()

Closes ml-explore#285
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QuantizationMode should be algebraic instead of scalar

1 participant