Prepare TensorStorage and Tensor for non-dense storage backends by kali · Pull Request #2051 · sonos/tract

kali · 2026-03-20T14:21:47Z

Add Downcast bound to TensorStorage trait (via downcast-rs) so downstream crates can downcast dyn TensorStorage to concrete types.

New trait methods: dyn_hash, eq_storage.
New Tensor methods: storage_as, try_storage_as, from_storage. Fix Hash, Drop, deep_clone, PartialEq to branch on dense vs non-dense instead of panicking on StorageKind::Other.

Add Downcast bound to TensorStorage trait (via downcast-rs) so downstream crates can downcast `dyn TensorStorage` to concrete types. New trait methods: dyn_hash, eq_storage. New Tensor methods: storage_as<T>, try_storage_as<T>, from_storage. Fix Hash, Drop, deep_clone, PartialEq to branch on dense vs non-dense instead of panicking on StorageKind::Other.

Adds the concrete storage type for block-quantized weights in linalg. Stores format, m, k dimensions, and one or more Arc<Blob> groups. Implements TensorStorage with proper hash, eq, clone, and debug support. Includes into_tensor() and to_block_quant_fact() convenience methods.

Preparatory methods for the BlobWithFact migration: - with_shape(): clone with updated m/k dimensions - split_m(): partition single-group into multi-group by splitting m rows

Replace the double-downcast pattern (Opaque → BlobWithFact → BlockQuantFact) with single-downcast via Tensor::try_storage_as::<BlockQuantStorage>() across all creation and access sites in 10 crates. Key changes: - All 14 creation sites now use BlockQuantStorage::new(...).into_tensor() - All ~20 access sites use tensor.try_storage_as::<BlockQuantStorage>() - SplitGroupBlockQuant output shape changes from [n] to [] (scalar) - Tensor::is_uniform() returns false for non-dense storage - pad_q40, as_q40_tensor, gather eval_bq updated signatures

No longer needed after migration to BlockQuantStorage.

BlobWithFact stored an explicit ND shape (e.g. [b, n, k]). The migration to BlockQuantStorage lost that structure by keeping only flat m and k, and into_tensor() produced a rank-0 tensor. get_concrete_shapes was then chaining the outer tensor shape ([]) with bqf.shape() ([b*n, k]), giving a 2D weight shape that confused the batch-matmul kernel for b > 1. - Replace separate m/k fields in BlockQuantStorage with shape: TVec<usize>; m() and k() are now derived from the shape. - Add new_with_shape() constructor for callers that need ND shapes. - into_tensor() uses self.shape() so the tensor carries its full logical shape. - get_concrete_shapes: drop the Q4_0 chain(bqf.shape()) and use b.shape() directly, since the shape now lives in the tensor. - pad_q40: preserve the ND shape when rebuilding the padded storage. - Test: use new_with_shape([b, n, k]) so batch structure survives the into_tensor / into_device round-trip.

…shape Same fix as the CUDA side: get_concrete_shapes and output_facts were building the weight shape by chaining the outer tensor shape with bqf.shape(). Now that BlockQuantStorage::into_tensor() gives the tensor its full logical shape, b.shape() is already correct and the chain produces duplicated dims. - kernels/matmul/mod.rs (eval + dispatch_eval): use b.shape() directly - ops/gemm.rs (output_facts + eval_with_session): same - tests: use new_with_shape([batch, n, k]) to preserve batch structure

…sors block_quant_aware_input_shape was chaining fact.shape + bqf.shape(), which was correct when the tensor was rank-0 (fact.shape=[]) but now produces double dims (e.g. [n,k,n,k]) since the tensor carries its shape. Use fact.shape directly, matching the non-opaque path. CudaTensor::to_host had a "scalar opaque" assertion assuming rank-0 tensors, and reconstructed storage with flat m/k from bqf. Remove the assertion and use new_with_shape with the full bqf shape so the round-trip preserves the tensor's logical shape.

Two issues introduced when BlockQuantStorage::into_tensor() started using the full logical shape (e.g. [m, k]) instead of a rank-0 tensor: 1. Const::change_axes now succeeded on opaque tensors (the rank-2 shape allowed AxisOp::Rm to proceed), causing ChangeAxes to strip the m-axis from Q4_0 weight consts. inject_m_or_n_axis then re-added it via an AddAxis node, but AddAxis::output_facts does not propagate konst, so optimized_mat_mul saw a non-const A input and panicked on the PackedFormat downcast. Fix: block axis changes on opaque Const tensors. 2. OptSimpleMatMulPack::output_facts copied inputs[0].shape (now [m, k]) as its output shape, but eval() still emits a rank-0 tensor. The mismatch caused PropConst to fail with a "trying to substitute" error. Fix: always emit a scalar (rank-0) output fact, consistent with eval.

Three stale assumptions based on block-quant consts being rank-0 scalars: 1. rewrite_block_quant_const_to_scalar: the rule was previously a no-op for block-quant consts (rank-0 hit the len()==0 early-return). Now that consts have shape [m, k] the rule fired and hit the assert that volume()==1. Fix: add !volume().is_one() to the early-return guard so properly-shaped consts are left alone; the NNEF serializer already writes the correct shape via bqs.to_block_quant_fact().shape(). 2. matmul deserializer: the two ensure!(shape.volume().is_one()) guards were checking for the old rank-0 representation. With the new code the loaded tensor has shape [m, k] so volume > 1. The guards are simply wrong now; remove them.

Previously only the `required_rank > actual_rank` branch propagated the BlockQuantFact shape and the `konst` field through axis operations. With rank-0 BQ tensors every AxisOp fell into that branch. Now that BlockQuantStorage carries a full logical shape ([m, k]) the tensor rank equals the required rank, so the normal path is used instead – and it neither updated the BQF shape nor forwarded `konst`. Fix the normal path to: 1. Update the BlockQuantFact shape in sync with the tensor shape. 2. Propagate `konst` by applying the same axis transformation to the const tensor (falls back gracefully if the transform fails, e.g. for symbolic dims). Also include the previously-derived fix for EinSum::axes_mapping that computes logical_rank - actual_rank correctly for the mixed BlockQuantFact / Packed* opaque-fact world.

SplitGroupBlockQuant::output_facts had a hardcoded rank==0 check and produced a rank-0 output fact — both relics of the old rank-0 BQ representation. Remove the check and build the output fact with the correct rank-2 shape. BlockQuantIntoShape::output_facts was cloning the input fact (including its old tensor shape) and only replacing the opaque metadata, so the tensor shape in the output fact was stale. Build a fresh fact using self.shape instead. Conv::wire_kernel_as_g_o_ihw was also guarding with `fact.rank() == 0` when validating the kernel format for BQ inputs. Drop that check so rank-2 BQ kernels are accepted.

BlockQuantStorage goes back to plain `m: usize, k: usize` instead of `shape: TVec<usize>`. The full logical kernel shape (e.g. [OC, IC, kH]) is carried by the Tensor and by BlockQuantFact; the storage only needs the flat matmul dimensions. The NNEF reader now stores the full header shape on the Tensor while keeping the BQS flat, and the variable deserializer builds the BlockQuantFact from the NNEF-declared shape so that conv deserialization sees the correct kernel dimensions. Conv wiring fixes: - OptSimpleMatMulPack gains a `num_groups` field so output_facts returns rank-0 (scalar) for single-group BQ and rank-1 [G] for multi-group, matching what eval produces. - wire_mm_weights_bias skips the group axis mapping when the packed kernel is a rank-0 scalar, and forces trivial_packing=true in that case to avoid navigating a rank-0 tensor via offset_axis. - declutter_as_einsum bails out early for opaque (BQ) kernels whose wire_kernel_as_g_o_ihw output does not carry a leading group axis. - consistent() relaxes the BQF/BQS cross-check to compare m(), k(), and format only, since the fact may carry a full logical shape while the storage is always flat. All conv-q40 NNEF test cases and the Llama-3.2-1B q40ef16 LLM test pass.

…ghts BlockQuantStorage::into_tensor() now produces [m, k] shaped tensors. GgmlGemm::output_shape expects the weights tensor to carry its full [b, n, k] shape to compute the correct output dimensions, so use Tensor::from_storage directly with the 3D shape in the test helper.

…eights Same issue as the cuda fix: BlockQuantStorage::into_tensor() now produces [m, k] shaped tensors, but GemmImpl::output_shape expects weights to carry their full [b, n, k] shape. Use Tensor::from_storage with the 3D shape.

pad_q40_weights was calling padded_bqs.into_tensor() which produces a flat [m, k+pad] tensor, but GgmlGemm expects the weight tensor to carry its full logical shape (e.g. [b, n, k+pad]). Reconstruct the padded tensor from the original shape with only the last dimension updated. Fixes cuda-lovelace openelm-270M q40ef16 CI failure.

kali force-pushed the block-quant-storage-prep branch 2 times, most recently from abd9f60 to 4c12e36 Compare March 21, 2026 12:08

kali added 16 commits March 21, 2026 14:46

Add with_shape and split_m helpers to BlockQuantStorage

a9f7680

Preparatory methods for the BlobWithFact migration: - with_shape(): clone with updated m/k dimensions - split_m(): partition single-group into multi-group by splitting m rows

Remove dead BlobWithFact type

9b81f27

No longer needed after migration to BlockQuantStorage.

fmt

5fe2ede

kali force-pushed the block-quant-storage-prep branch from 4c12e36 to cbe98ce Compare March 21, 2026 13:46

kali added 2 commits March 21, 2026 15:12

fmt

c059dc8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare TensorStorage and Tensor for non-dense storage backends#2051

Prepare TensorStorage and Tensor for non-dense storage backends#2051
kali wants to merge 18 commits intomainfrom
block-quant-storage-prep

kali commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kali commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant