Introduce Mudlark — an adaptive dual-tree spatial index by da2ce7 · Pull Request #832 · torrust/torrust-index

da2ce7 · 2026-03-12T09:48:48Z

The Magpie-lark is a small Australian bird. Common, unassuming, often spotted fossicking in the mud — often affectionately called a "mudlark". What makes it distinctive is that mated pairs sing together in precisely coordinated duets, each bird contributing its part to a shared song.

That's the metaphor behind this crate's name. Mudlark is built around two trees that work in concert: one handles spatial geometry, the other handles significance ranking. They each have their own form and uses, and when they co-operate, they duet.

Summary

This PR adds mudlark to the workspace: a library crate shipping a single core type, GvGraph, that acts as an adaptive spatial index over a bounded domain. You feed it observations — a coordinate and a value — and it figures out where to pay attention. Regions that receive lots of activity earn finer resolution. Quiet regions stay coarse. As the distribution shifts, the structure re-evaluates, promoting what's become important and shedding what's gone cold.

The simplest way to think about it: a histogram that decides its own bin widths, continuously, based on what it's seen.

The spec calls it a φ-Bounded Geometric-Value Graph — φ (the golden ratio) because the depth bound follows a Fibonacci recurrence, bounded because the node budget is hard-capped, geometric for the spatial partition tree, value for the significance tournament. GvGraph in the code.

Most adaptive structures make one tree do everything — spatial partitioning, importance ranking, rebalancing — all tangled together. Mudlark separates those concerns cleanly. The G-Tree (Geometric) is a binary partition of the coordinate space. It knows about intervals, accumulated values, and routing. The V-Tree (Value) is a competitive tournament that ranks entries by significance. The two share the same underlying nodes but see them from completely different angles — one spatial, one evaluative. Neither alone is sufficient. Together they produce an index whose resolution tracks the data's actual structure, with formal efficiency guarantees rooted in information theory.

The everyday surface is straightforward: observe into it, query it (point lookups, range sums, O(1) contour projection), sample from it proportionally, extract significance-ordered snapshots, and apply temporal decay so it tracks the present rather than accumulating the past forever. The sampling cost adapts to the distribution — hotspots are cheap, uniform distributions cost more — and the bound is provably within 1.44× of the information-theoretic optimum.

This data structure is general and should find use in future developments.

37 source modules with a clean internal/external boundary (pub(crate) for internals, flat re-exports for the public API) and #![forbid(unsafe_code)]
1 311 tests — 651 unit, 543 integration, 117 doc-tests — including adversarial stress patterns, a full invariant checker, and two narrative pedagogy tests that walk through every public operation step by step
143 Criterion benchmarks across six families with published scaling exponents
~10k-line formal specification (idea.md) developing the structure from first principles in coding theory — the implementation references it throughout via §IDEA N annotations
38 architecture decision records documenting every significant design choice, including a mutation-testing gap analysis (ADR-M-039)
README with runnable examples — every code block in the Quick Start and Advanced Usage sections compiles as a doc-test
Companion documentation — API reference, architecture guide, performance analysis, testing guide, and standalone theory document
AGPL-3.0-only with Linking Exception — a formal LINKING-EXCEPTION file permits unmodified use through the public API (Surfaces 1 and 2) without copyleft obligations on downstream code

Changes outside mudlark

Cargo.toml / Cargo.lock — workspace member added, lock file regenerated
project-words.txt — spellcheck dictionary entries for mudlark terminology

Reviewing this

The public API is defined in docs/api.md and surfaced through the flat re-exports in lib.rs — that's the best starting point. The formal spec (docs/idea.md) is reference material; you don't need to read it to review the code, but the source cross-references it via §IDEA N annotations if you want to trace a design choice back to its rationale.

For a focused pass: lib.rs (crate root and re-exports) → src/graph.rs (the main type) → src/traits/ (the public trait surface) → tests (integration tests exercise only the public API).

The two pedagogy tests (tests/pedagogy.rs and tests/pedagogy_advanced.rs) are designed to be read end-to-end — they walk through the worked example from the spec, printing a teaching narrative at each step. Running cargo test -p torrust-mudlark --test pedagogy -- --nocapture produces a readable tutorial.

Notes

Ships at 1.0.0. The public API surface is defined, documented, and semver-locked. Config and GvGraph have no Default impl because the parameters are genuinely domain-dependent. The API surface is intentionally narrow and documented in docs/api.md.
The formal spec is the single source of truth for the design — the ADRs are the full implementation notes.
AGPL-3.0-only with a formal LINKING-EXCEPTION: using the public API (Surfaces 1 and 2) from your own code doesn't trigger copyleft on downstream code. Modifications or use of internal APIs (Surface 3) are subject to the full AGPL.
MSRV 1.85, inherited from the workspace.
No unsafe code. Forbidden at the crate root.

codecov · 2026-03-12T09:58:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.64%. Comparing base (6749e87) to head (b358f6a).

Additional details and impacted files

@@             Coverage Diff              @@
##           develop     #832       +/-   ##
============================================
+ Coverage    52.20%   67.64%   +15.44%     
============================================
  Files          116      153       +37     
  Lines         6166    12148     +5982     
  Branches      6166    12148     +5982     
============================================
+ Hits          3219     8218     +4999     
- Misses        2866     3681      +815     
- Partials        81      249      +168

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

josecelano · 2026-03-16T16:17:20Z

@da2ce7 why text-to-png dependency has been removed? The image proxy displayed an error message (as an image) when the source image was unavailable.

…on testing results - MUDLARK_EXPLAINED.md: accessible 14-section explainer of the dual-tree design - GLOSSARY.md: authoritative term definitions with naming rationale - USE_CASES.md: 5 concrete operational problems with fit assessments - COORDINATE_ENGINEERING.md: 6 encoding techniques for non-integer domains - DESIGN_ALTERNATIVES.md: 8 alternative data structures compared to mudlark - VERIFICATION_STRATEGY.md: black-box verification plan for AI-assisted PR - REVIEW_PR_832.md: phased review notes (phases 1–4 complete) - mutants-results/: cargo-mutants run over torrust-mudlark - 1631 mutants tested, 621 caught, 413 missed, kill rate 60.1% - missed.txt contains all 413 uncaught mutations for author to review - reference_comparator.rs: naive O(N) reference impl to compare against mudlark

josecelano · 2026-03-17T08:49:19Z

Review notes — @josecelano

Hi @da2ce7, I've been reviewing this PR. Given the size (~64k lines, AI-assisted), I took a different approach from a traditional line-by-line review. Here's what I've done and what I found.

What I've produced

All review documents are in a branch on my fork:
josecelano/torrust-index — review/pr-832-mudlark

Document	Purpose
`MUDLARK_EXPLAINED.md`	Accessible explainer of the dual-tree design for non-specialists
`GLOSSARY.md`	Authoritative definitions for every mudlark-specific term
`USE_CASES.md`	5 concrete operational problems with fit assessments
`COORDINATE_ENGINEERING.md`	Techniques for encoding non-integer domains into coordinates
`DESIGN_ALTERNATIVES.md`	8 alternative data structures compared to mudlark (including the sparse segment tree relationship)
`VERIFICATION_STRATEGY.md`	Black-box verification strategy for an AI-assisted PR of this scale
`REVIEW_PR_832.md`	Phased review notes — phases 1–4 complete, phases 5–7 in progress

Known issues from the code review (Phases 1–4)

These are concrete, actionable items regardless of correctness of the implementation:

MSRV bump 1.72 → 1.80 touches the whole workspace, not just mudlark. Needs explicit justification / changelog entry.
src/ui/proxy.rs regression — error image rendering is silently broken. map_error_to_image() returns Bytes::new() for all errors. The text-to-png-rendering feature flag is defined but always panics if enabled. This is a user-visible regression, not just a build fix.
torrust-sentinel in machete ignore list — this crate doesn't exist in the workspace yet. The forward reference needs a comment explaining the intent.
docs/api.md is out of date — several pub methods are undocumented: v_root(), depth_buffer(), headroom(), soft_limit(), is_ancestor_of(), build_plateaus(). The Node struct is also missing the gnode_id field added by ADR-M-036.
debug_plateau_basis() should be #[doc(hidden)] or #[cfg(test)] — the doc-comment itself says it "exposes pub(crate) basis bookkeeping for integration tests", which means it's not a stable API surface.
AGENTS.md scope — mudlark-specific cross-reference conventions (§IDEA M-N, §API M-N, etc.) were added to the root AGENTS.md which applies to the whole repo. Consider moving them to packages/mudlark/AGENTS.md.

Mutation testing

I ran cargo-mutants on torrust-mudlark (1631 mutants):

Outcome	Count
Caught (tests detected the break)	621
Missed (tests did NOT detect the break)	413
Unviable (wouldn't compile)	597
Kill rate	60.1%

A 60% kill rate means ~40% of deliberate code breaks went undetected by the 833 tests. This is below what I'd expect for a library with formal invariant documentation. The full list of 413 missed mutations is in:

docs/pr-reviews/832/mutants-results/mutants.out/missed.txt

Notable patterns in the missed list:

Arithmetic mutations in contour_range.rs::compute_plateau_energy (plateau energy computation)
Arithmetic mutations in pewei.rs::reconstruct (PEWEI reconstruction)
Operator mutations in Config::validate (parameter validation guards)
Operator mutations in Arena::alloc (arena allocation arithmetic)
Several comparison operator flips in testing/plan.rs (test helper internals — lower priority)

This doesn't necessarily mean the implementation is wrong, but it does mean the tests are not verifying the numeric outputs of these functions precisely enough. I'd recommend reviewing the missed mutations, particularly in the non-test modules, and adding targeted assertions.

Next steps (my side)

I'm working on a naive O(N) reference implementation (packages/mudlark/tests/reference_comparator.rs) that runs the same observations through a simple Vec-backed structure and compares range_sum outputs against mudlark. This is a black-box correctness check — if they diverge, there's a real bug with a concrete reproducer.

I'll update this comment when that's complete.

The design intent of mudlark is clear and well-documented. The dual-tree architecture is well-motivated, the public API is well-structured, and cargo test passes. The items above are what I'd want resolved before merge.

da2ce7 · 2026-03-19T06:17:45Z

Hello @josecelano , Thank You for your time in reviewing this pull request.

I did address the Non-Mudlark issues in:

refactor: replace text-to-png with internal render-text-as-image and modernize codebase #833

Now merged, and this pull request has been rebased.

Now I will first focus on the non-substantive issues, i.e. test counts being out of sync, and the api.md not being complete.

Then after I will spend some time on test coverage that the mutation-probing has shown to be insufficient.

Kind regards,

Cam.

josecelano · 2026-03-19T07:27:53Z

Hello @josecelano , Thank You for your time in reviewing this pull request.

I did address the Non-Mudlark issues in:

refactor: replace text-to-png with internal render-text-as-image and modernize codebase #833

Now merged, and this pull request has been rebased.

Now I will first focus on the non-substantive issues, i.e. test counts being out of sync, and the api.md not being complete.

Then after I will spend some time on test coverage that the mutation-probing has shown to be insufficient.

Kind regards,

Cam.

Hi @da2ce7 , I haven't finished the review. I will continue next week. It's being very interesting (I've learn many new things) and I still have many questions and I want to get a deeper understanding.

da2ce7 · 2026-03-20T07:22:13Z

Hello @josecelano upon analysis of your comment:

docs/api.md is out of date — several pub methods are undocumented: v_root(), depth_buffer(), headroom(), soft_limit(), is_ancestor_of(), build_plateaus(). The Node struct is also missing the gnode_id field added by ADR-M-036.

I did a propper audit of the Public Facing API for v1.0, the existing ADR-032 couldn't answer the questions that actually mattered.

The problem was concrete. I needed to decide: should Node carry a parent handle? The three-surface taxonomy said "Surface 1 is for Prints — lightweight view types." But parent points back into the graph. Is that a Print or an Emulsion concern? The old ADR had no way to decide — it was a classification scheme, not a decision procedure. The same ambiguity blocked every other API question: is VNodeId public? Should GNodeInfo exist as a separate type from Node? Does Observation::scale() belong on every observation or only the ones where scaling is meaningful?

Each of those questions had a defensible answer, but each one felt ad hoc. The taxonomy named the surfaces; it didn't say how to assign things to them. So instead of making five judgment calls, I went back to ADR-032 and asked what the assignment criteria actually are.

Three tests fell out:

Primary test: does the symbol serve one of the six core operations (three projections, three mutations, or construction)? If it only serves diagnostics/serialisation/testing, it's #[doc(hidden)] or feature-gated.
Handle test: does a handle type have both a public producer and a public consumer? A handle with only one end is a dead-end — it promises a capability the crate doesn't deliver.
Snapshot test: does a Print field describe something fixed at creation, or does it track the graph's evolving state? Immutable provenance belongs on Surface 1; live topology belongs on Surface 3.

Once the tests exist, the API decisions become mechanical:

Node.parent passes the snapshot test — it's set once at split time and structurally immune to eviction. It belongs on Surface 1, so Node gets a parent field and upward traversal is available on the stable API.
VNodeId fails the handle test — v_root() produces it, but nothing public consumes it. Both become pub(crate).
GNodeInfo was a separate type for what Node already is. With the tests in hand, there's no reason for two types — gnode_info() returns Node.
Observation::scale() was a panicking default on types where scaling is meaningless. Split into ScalableObservation sub-trait — same-type observations no longer carry an unusable method.
Child linkage (left/right) fails the snapshot test — mutable topology. Hidden behind #[doc(hidden)] gnode_children().

The analogy section got similar scrutiny: every mapping now carries an explicit fidelity tag (load-bearing / illustrative / decorative / breaks) so future decisions can lean on the parts of the metaphor that do real design work and leave the rest alone.

Separately, and found during this pass: the decay factor table was sized assuming tree depth ≤ N, which doesn't hold for f64 coordinates. That's a real panic. ADR-M-038 documents the fix, and ~770 lines of new integration tests lock it down along with several IEEE 754 edge cases (0 * ∞, ln(0) * 0, ∞ - ∞ producing NaN in various code paths).

Now the API document is correct for v1 (at least I hope so!), and the choices are both much more defensible and maintainable.

da2ce7 · 2026-03-21T07:49:38Z

Hello @josecelano ,

I took the time to overhaul the test suite after your mutation testing found the package lacking — the results were genuinely useful. The surviving mutants clustered into clear categories (value-correctness, boundary-precision, type-coverage, and invariant checker self-consistency), so I wrote up the full analysis as ADR-M-039 and then closed the gaps.

Test count: 868 → 1 311 (+443 tests across all three surfaces).

Two new tests are worth reading end-to-end — they're designed as teaching material, not just assertions:

pedagogy.rs — walks through the worked example from §IDEA M-16 step by step: bootstrap split, catalytic split, competitive promotion, the uncle shield, PEWEI extraction, proportional sampling, targeted subtree decay, and post-decay rebalancing. Every assertion maps to a spec claim; every step prints a narrative explaining what happened and why.
pedagogy_advanced.rs — companion covering the read surface: construction alternatives (from_observations, Extend, Clone), G-node introspection, range sum pro-ration subtleties, the full contour-range query pipeline, plateau selection, progressive PEWEI reconstruction, and budget-limited eviction. Eighteen steps on a shared graph, printing a continuous narrative.

Both use harness = false so the steps execute in sequence — running cargo test -p torrust-mudlark --test pedagogy -- --nocapture produces a readable walkthrough.

The README has also been rewritten to take advantage of these tests. The Quick Start and Advanced Usage sections are now runnable doc-tests (compiled via include_str! in lib.rs), so every claim in the README is continuously verified. The pedagogy tests assert the same scenarios with full invariant checks — if the prose drifts from reality, CI catches it.

I'll be interested to see how the mutation testing results change with this revision.

Introduce the new sub-package called "mudlark", named after the Magpie-Lark. Path: "packages/mudlark"

BREAKING CHANGES: - Replace `GNodeInfo<C, V>` with `Node<C, V>` — `gnode_info()` now returns the same Surface 1 view type used by `layers()`. A new `#[doc(hidden)] GNodeChildren` covers the child-linkage use case. - Demote `VNodeId` to `pub(crate)` — no public consuming method exists (ADR-M-032 handle test). - Demote `v_root()` to `pub(crate)`. - Mark `GNodeId::from_index()` and `GNodeId::index()` as `#[doc(hidden)]`. - Split `Observation::scale()` (panicking default) into a separate `ScalableObservation<V>` sub-trait. Cross-type impls (f64→uint, f32→uint, f64→f32) implement both; same-type impls no longer carry an unusable `scale()`. - Relax `select_plateaus` bound from `V: Proratable + Inspectable` to `V: Inspectable`. - Demote `serde` from a default feature to opt-in. - `Cell` now represents the uncovered interval (terminal or vacated semi-internal half), not always the full G-node range. `sample()` returns the narrowed interval for semi-internals. Bug fixes: - Fix decay factor-table index-out-of-bounds panic for f64 coordinates where G-tree depth can exceed N (ADR-M-038). The factor table is now sized from the actual max depth within the subtree, not from `N - d_root`. - Allow `attenuation = 0.0` in `decay()` (annihilation, §THEORY M-7.3) and handle `attenuation = ∞` (infinite amplification) without NaN from `ln(0) * 0` or `ln(∞) * 0`. - Guard `Attenuatable::attenuate()` for f32/f64 against IEEE 754 `0 * ∞ = NaN` — if either operand is zero, return zero. - Fix invariant checkers and debug assertions to short-circuit `∞ == ∞` before computing `∞ - ∞ = NaN`. Improvements: - Add `Node::parent` field and `Node::is_root()` method. - Add `debug_assert!` P2 guard in `observe()` for negative accumulations (ADR-M-033). - Move `select_plateaus` from graph_query.rs to graph_plateau.rs. - Relax `Pewei` method bounds: `layer_count()`, `node_count()`, `reconstruct_all()` require only `Accumulator`, not `Proratable`. - Correct sampling cost from O(1.44 H + 1.67) to O(1.44 H). - Mark `build_plateaus()` and `build_plateau_basis()` as `#[doc(hidden)]`. Documentation: - Rewrite ADR-M-032 with testable surface-assignment criteria and expanded analogy tables. - Add ADR-M-038 (decay factor-table depth bound). - Significantly expand docs/api.md with Cell/Node/contour-range semantics, cross-references, and worked examples. - Update idea.md, architecture.md, performance.md, testing.md. Tests: - Add `tests/decay_infinite.rs` — infinite amplification, ADR-M-038 regression, and annihilation edge cases (480 lines). - Add `tests/negative_f64.rs` — P2 violation guard and downstream semantic breakage tests (290 lines). - Update test count: 833 → 868 (346 unit, 415 integration, 107 doc-tests). - Update CHANGELOG release date to 2026-03-20.

…examples A cargo-mutants run produced ~305 surviving mutants, revealing systematic blind spots in value-correctness, boundary-precision, and type-coverage testing. ADR-M-039 documents the analysis and classifies survivors into seven mutant classes; this commit closes the gaps it identified. Test count: 868 → 1 311 (+443 tests across all three surfaces). New crate-level test modules: - config_validation: rejects invalid Config combinations - decompose_basis: value-exact basis-element assertions - invariant_self: checker self-consistency (kills class-1 mutants) New integration tests: - pedagogy: 10-step narrative walkthrough of the worked example from §IDEA M-16 — bootstrap split, catalytic split, skip- promote, uncle shield, PEWEI extraction, proportional sampling, subtree decay, and post-decay rebalancing. Every assertion maps to a spec claim; every step prints a teaching narrative. - pedagogy_advanced: 18-step companion covering the read surface — construction alternatives (from_observations, Extend, Clone), gnode_info / is_ancestor_of introspection, range_sum pro-ration subtleties, full contour-range query pipeline (endpoint lattice, basis consolidation, energy-field algebraic identity, energy-only variant, invalid-endpoint rejection), plateau selection with outward snap, progressive PEWEI reconstruction with truncation, and budget-limited eviction under a tight budget=10 ceiling. - eviction_diagnostic: step-by-step diagnostic replay (replaces eviction_debug) - eviction_plateau: plateau integrity through eviction cycles (replaces eviction_p_i2) Both pedagogy tests use `harness = false` (custom main) so the steps execute in sequence on a shared graph, printing a continuous narrative that can be read alongside the formal spec. Removed (superseded): - eviction_debug → eviction_diagnostic - eviction_p_i2 → eviction_plateau Every existing test file receives a doc-header index table per the new AGENTS.md "Test Doc-Headers" convention, plus targeted new tests to kill mutation-testing survivors. README rewrite: - New "Choose GvGraph" section (when to use, when not to) - Quick Start rebuilt around a minimal N=3 graph with five canonical observations; each section is a runnable doc-test (compiled via `include_str!` in lib.rs) - New sections: adaptive splitting with ASCII contour diagrams, entropy-adaptive sampling, targeted subtree decay, advanced usage (checkpoint/fork, gnode_info, contour-range decomposition, progressive PEWEI reconstruction, budget-constrained operation) - "Limitations" section (1D only, f64 cost, single-writer, no signed accumulators) - Performance table softened to "typical latencies" with caveat that absolute numbers vary by hardware - Cross-reference conventions reformatted as a tag→document table - License section updated for the new LINKING-EXCEPTION LINKING-EXCEPTION: AGPL-3.0-only linking exception (v1.0) permitting unmodified use through Surfaces 1 and 2 without copyleft obligations on the consumer's code. Surface 3 (pub(crate) / #[doc(hidden)]) remains fully AGPL. Explicitly notes the Legacy Exception does not apply to this package. Other changes: - Cell / api.md docs expanded: sample() can land on internal G-nodes whose V-entry carries frozen pre-split intensity (ADR-M-019, §IDEA M-6.5) - GNode::Internal doc-comment expanded with the same detail - Config<V> derives PartialEq (needed by config_validation and the pedagogy test's README_CONFIG == worked_example_config guard) - Testing presets: no_split_config, f64_no_split_config, decay_config, contour_range_config, plan_multi_plateau - Testing runners: run_diagnostic, run_diagnostic_budgeted (step-by-step invariant-checking replay with full diagnostic dump on first violation) - ADR tables reformatted for consistent column alignment - AGENTS.md: "Test Doc-Headers" convention added, table and prose refinements - CHANGELOG date corrected to 2026-03-21; test counts updated - rand 0.10 added to dev-dependencies - updated `project-words.txt`

…on testing results - MUDLARK_EXPLAINED.md: accessible 14-section explainer of the dual-tree design - GLOSSARY.md: authoritative term definitions with naming rationale - USE_CASES.md: 5 concrete operational problems with fit assessments - COORDINATE_ENGINEERING.md: 6 encoding techniques for non-integer domains - DESIGN_ALTERNATIVES.md: 8 alternative data structures compared to mudlark - VERIFICATION_STRATEGY.md: black-box verification plan for AI-assisted PR - REVIEW_PR_832.md: phased review notes (phases 1–4 complete) - mutants-results/: cargo-mutants run over torrust-mudlark - 1631 mutants tested, 621 caught, 413 missed, kill rate 60.1% - missed.txt contains all 413 uncaught mutations for author to review - reference_comparator.rs: naive O(N) reference impl to compare against mudlark

josecelano · 2026-03-24T13:45:32Z

Component Architecture (module dependency layers)

Execution / Call Flow

cc @da2ce7

NOTE: In the Brave browser, the diagram lines are not displayed when you click the SVG file. Working fine on Chrome and Firefox.

josecelano · 2026-03-25T12:19:43Z

Finding 10 — Arena Stale-Handle ABA Problem

@da2ce7 While reviewing arena.rs I noticed that Arena<T> has no generation tracking on its free list. When a slot is deallocated and later recycled for a new node, any GNodeId that was issued for the old occupant will silently resolve to the new one.

The is_occupied() guard in gnode_info(), gnode_children(), and is_ancestor_of() only checks whether some node occupies the slot — it cannot distinguish "original node" from "a different node that reused the same slot".

Concrete scenario:

User calls plateaus() and caches the resulting GNodeId values (e.g. in a HashMap<GNodeId, MyState>).
User calls observe(...) — eviction frees some of those G-nodes.
A subsequent split recycles the freed slots for new G-nodes.
User calls gnode_info(cached_id) — is_occupied() returns true and returns Some(Node) for the new occupant, not the original. No error is signalled.

This is the classic ABA problem. No memory-safety issue (Rust prevents that), but it can silently corrupt a caller's view of the tree.

The core question is: is caching a GNodeId across structural mutations a supported use case?

If no (ephemeral only): the contract should be documented. The is_occupied() guard in gnode_info() is then misleadingly protective — it appears to return None for dead handles but cannot do so once a slot is reused.
If yes (stable across mutations): a generation counter is needed (either per-slot on Arena, or an epoch on GvGraph).

Full write-up with fix options and a comparison table (custom arena vs generational-arena) is in the review branch:

docs/pr-reviews/832/findings/finding-10-arena-stale-handle.md

torrust#832) - Add packages/mudlark/tests/proptest_harness.rs: 7 property-based tests (P1 total_sum conservation, P2 range_sum near-additivity, P3 range_sum monotonicity, P4 structural invariants, P5 plateau coverage, P6 annihilation, P7 budget enforcement). All 7 pass. - Add proptest = "1" dev-dependency and [[test]] entry to mudlark Cargo.toml. - Add docs/pr-reviews/832/PROPTEST_STRATEGY.md explaining the oracle problem, reference model, properties checked, and development findings. - Update docs/pr-reviews/832/VERIFICATION_STRATEGY.md: add Step 6 row. - Add docs/pr-reviews/832/proptest-results/run-2026-03-25.txt and regression seeds: two proptest counterexamples discovered and resolved during harness development (F-P2 range_sum approximation, F-P7 budget headroom).

josecelano · 2026-03-25T13:21:47Z

Finding #11 — `range_sum` approximation not clearly documented

@da2ce7 — one item worth addressing before merge: range_sum returns an approximation, not an exact value, even for integer accumulators — and this isn't communicated prominently enough in the public API docs.

What proptest found

While running a property-based harness (7 tests, all passing — see docs/pr-reviews/832/proptest-results/run-2026-03-25.txt on the review branch), the initial property P2 range_sum(lo..m) + range_sum(m..hi) == range_sum(lo..hi) was immediately falsified by proptest with this minimal counterexample:

observe(0, 64)
range_sum(0..1) = 0   // expect: 64
range_sum(1..5) = 0
range_sum(0..5) = 1

So range_sum(0..1) returns 0 even though coord 0 holds 64 units of energy.

Root cause

range_sum_inner (ADR-M-020) pro-rates g.own by floor(own × overlap_width / node_width) whenever the query boundary bisects a G-node. Until the G-tree has built enough depth at coord 0 to give it its own fine leaf, all energy sits in the root covering [0, 256) and is scaled by 1/256. For small accumulator values or query windows narrower than the node, the result is 0.

This is by design — the library is a density estimator. But a user with an integer accumulator will reasonably write:

g.observe(42, 100);
assert_eq!(g.range_sum(42..43), 100);  // This FAILS

and receive no hint from the docs that this isn't supposed to work.

Mutation Testing Re-run — 2026-03-26

@da2ce7 — here are the results of the re-run of cargo mutants -p torrust-mudlark against the current branch (1,311 tests).

Result

Metric	Before (+443 tests)	After
Viable mutants	1,034	1,077
Caught	621	1,004
Missed	413	72
Timeout	—	1
Unviable	—	627
Kill rate	60.1%	93.2%

The new test suite brings the kill rate well above the 80% threshold. Well done — that's a significant improvement.

Remaining 72 missed mutants

The surviving mutants are concentrated in five areas. I've grouped them below.

`graph_plateau.rs` — 28 mutants

The densest cluster. Most are in consolidate_all_basis, fixup_plateau, repair_p_i4, split_for_p_i4, and find_boundary_node. Notably:

consolidate_all_basis: loop counter / arithmetic (lines 1019, 1022) — operator flips (<→<=, +=→-=, -→+) survive, suggesting the invariant tests don't exercise boundary iterations.
repair_p_i4 / split_for_p_i4: several arithmetic and comparison flips at lines 1227, 1241, 1244, 1292 — the P-I4 repair path may need a tighter boundary test.
find_boundary_node: > → < / >= both survive (line 1328) — the boundary direction is not pinned by an assertion.
recompute_plateau / fixup_plateau: +→- and +→* survive at lines 405 and 1144.
debug_assert_plateau_mirror_consistency / debug_check_plateau_sums: mutations in the checker itself survive — this means the checker is not independently tested. A dedicated unit test that constructs a known-inconsistent state and asserts the checker fires would close these.

`diagnostic.rs` — 12 mutants

audit_plateau_consistency (lines 97, 106, 117, 124): four !=/== flips survive — the audit function itself is not verified with a state it should flag.
diagnose_missed_violation (lines 189, 255, 273, 274, 288, 298): match arm deletions and +=/-= survive — again the diagnostic logic has no independent tests; mutating it doesn't break any test.
Same pattern as graph_plateau.rs: the verifier needs its own verifier.

`decay.rs` — 9 mutants

decay_uniform (lines 179, 198): ! deletion and ||→&& survive.
decay_selective (lines 230, 242, 258, 274, 279, 344, 363): several arithmetic and comparison mutations survive — boundary cases between uniform and selective decay, and the divisor in the stepping formula, are not directly asserted.

`evict.rs` — 4 mutants

evict_tip line 251: ! deletion survives.
plateau_after_evict lines 507, 513: !=→== and ! deletion survive — note that this is Finding 12; the debug_assert that fires in debug builds is blocking these test paths. Once Finding 12 is resolved, these may become detectable.

`pewei.rs`, `graph.rs`, `arena.rs`, `graph_budget.rs`, `observe.rs`, `rebalance.rs` — 19 mutants

Pewei::reconstruct lines 344: +→* in two positions — reconstruction arithmetic not verified with enough precision.
Config::validate lines 221–223: the validation boundary arithmetic (-, +, *) survives — Config validation likely isn't tested near the exact boundary values.
Arena::alloc line 59: / with * survives — Arena capacity arithmetic is probably not tested at the boundary.
GvGraph::adjust_depth_gates line 58: one arithmetic operator survives.
GvGraph::evict_candidates line 243: ||→&& survives.
GvGraph::observe line 150: one arithmetic flip survives.
escalate_after_promote line 1117: &&→|| survives.

Timeout (1)

packages/mudlark/src/traits/coordinate.rs:262:13: replace - with / in <impl Coordinate for f32>::width

Coordinate::width with / instead of - likely causes an infinite loop in geometry traversal. This is an acceptable outcome — the test would need an explicit timeout guard to kill it.

Summary Assessment

The 93.2% kill rate clears the 80% bar and this is no longer a blocker. The remaining 72 fall into two categories:

Verifier self-consistency (diagnostic.rs, graph_plateau.rs checker functions) — the invariant-checking code has no independent tests that construct invalid states. These would be best addressed with a small set of unit tests that deliberately build a broken invariant and assert the checker fires.
Boundary arithmetic in plateau consolidation, decay stepping, Config validation, and Arena alloc — these need tests at the exact boundary values or property-based tests that cover the boundary region.

Neither category is a blocker for merge, but the diagnostic/verifier gap is worth noting for long-term maintainability.

…t#832) - Re-run 2026-03-26 with Cameron's 1,311 tests (was 868) - 1004 caught / 72 missed / 1 timeout / 627 unviable / 1077 viable - Kill rate: 60.1% → 93.2% (well above 80% target) - Update Phase 5 verdict, author responses, measurement snapshots, open items - Also include the mutants-results/mutants.out output from the run - PR comment posted: torrust#832 (comment)

josecelano · 2026-03-26T15:54:25Z

Isolated-Branch Experiments — Review Supplement

Hi @da2ce7 — alongside the main review I've been running a parallel set of
experiments on an isolated copy of the mudlark package:

Branch: josecelano/torrust-index @ review/pr-832-mudlark-isolated

The idea: extract only the Rust source (no docs, no comments, no tests) into a
clean sandbox and use it to answer questions that would have added noise to the
main review. Full log: docs/pr-reviews/832/MUDLARK_ISOLATED_EXPERIMENTS.md

Experiments at a glance

#	Experiment	Key result
E-1	Code isolation + self-explanation	✅ Code is self-explanatory from types/naming alone
E-2	Architecture diagrams	✅ Component + call-flow SVGs generated
E-3	Module-restructure proposal	✅ Proposed grouped layout (nodes/spatial/tree/graph/diagnostics)
E-4	Integration-test safety net	✅ 10 end-to-end tests using invariant checker as oracle
E-5	Coverage baseline	✅ 60% lines before any change
E-6	IP-range ban-detection use case	✅ Runnable example + design doc
E-7	TUI visualiser	✅ Live G-tree / V-tree terminal UI
E-8	ANSI heat-map example	✅ /24 subnet density grid
E-9	Snapshot tests (insta)	✅ Regression guard for refactoring
E-10	Module-restructure execution	✅ Refactor in 5 sequential commits, all tests green
E-11	Unit-test coverage improvement	✅ 60% → 86.96% lines (96.67% functions)
E-12	Cyclomatic complexity analysis	✅ `plateau_after_evict`: CC=39, cognitive=95 — top hotspot
E-13	Tree-snapshot + Graphviz DOT export	✅ Visual diffing of tree structure
E-14	G-I5 bijection invariant	✅ New diagnostic: G-tree ↔ V-tree bijection check
E-15	Type-extraction refactor (P1–P8)	✅ `Config<V>`, `GNodeChildren`, `ViolationSources`, … into own files

Things that might interest you

E-1 confirms the code quality: stripped of all comments, an AI agent
could reconstruct the dual-tree design, the contracts on observe/decay/
sample, and the role of each module purely from types and names.
E-3/E-10/E-15 are proposed improvements rather than required fixes —
happy to share them as separate PRs if they would be useful.
E-12 corroborates the complexity concerns in Finding 9 and 12 from the
main review: plateau_after_evict (CC=39, cognitive complexity 95) is the
complexity hot-spot.
E-11 pushed line coverage to ~87%; the remaining gap is almost entirely
branches in invariants.rs and plateau.rs that can only fire when the data
structure is in an invalid state — which connects directly to Finding 12.

The G-node arena reuses slots via a free list. A handle obtained before an eviction could silently bind to a new, unrelated node after the slot is reallocated (ABA problem). This is not hypothetical — the sentinel stores handles across observation cycles where evict-then-split sequences can recycle slots. Introduce a two-tier handle design: - GNodeId (pub, 8 bytes): slot index + generation counter. Used on all public API surfaces. Consuming methods (decay, gnode_info, is_ancestor_of) validate the generation and panic on mismatch. - GSlotPointer (pub(crate), 4 bytes): generation-free handle for internal tree walks where no interleaving eviction can occur. Renamed from the former GNodeId. - VSlotPointer (pub(crate), 4 bytes): renamed from VNodeId for consistency with the new naming convention. Arena changes: - alloc() returns (index, generation) tuple - dealloc() increments the per-slot generation counter - New generation() accessor for handle validation The generation Vec is cold data — touched only on alloc/dealloc, never on hot read/write paths. Internal hot paths continue to use 4-byte GSlotPointer; the 8-byte GNodeId exists only at the API boundary.

josecelano · 2026-03-30T07:45:55Z

Hi @da2ce7, I'm still working on the review. I'm refactoring the package to see how far I can get in reducing accidental complexity. I'm working in this orphan branch only with the package.

https://github.com/josecelano-bot/torrust-index/tree/review/pr-832-mudlark-isolated

I think there are already a couple og good abstractions like:

VTree
GTree
CoordinateRange
...

josecelano · 2026-03-31T17:33:49Z

Hi @da2ce7, I'm still working on a big refactor on the following branch:

https://github.com/josecelano-bot/torrust-index/tree/review/pr-832-mudlark-isolated

That branch has diverged a lot (you have also made some changes after I forked your PR branch). I don't know if it makes sense to continue working on it unless you think those changes are valuable and could potentially be merged into your branch.

I have mainly been working on grouping free functions around new abstractions. The long-term vision for this megarefactor is described here:

https://github.com/josecelano-bot/torrust-index/blob/review/pr-832-mudlark-isolated/docs/refactoring-plans/oo-method-migration/oo-method-migration.md

I would also like to review how the package handles errors. I see many panics, asserts, debug_ssserts, etc, and in many cases I think it would be better to return a "Result" type. One example, the get function in the arena;

https://github.com/torrust/torrust-index/pull/832/changes#diff-740895d2aae80f3c35d9854eea374f1d02cac113dd0499bda803957605e82243R99-R109

josecelano · 2026-04-01T08:40:28Z

Hi @da2ce7, I'm still working on a big refactor on the following branch:

https://github.com/josecelano-bot/torrust-index/tree/review/pr-832-mudlark-isolated

That branch has diverged a lot (you have also made some changes after I forked your PR branch). I don't know if it makes sense to continue working on it unless you think those changes are valuable and could potentially be merged into your branch.

I have mainly been working on grouping free functions around new abstractions. The long-term vision for this megarefactor is described here:

https://github.com/josecelano-bot/torrust-index/blob/review/pr-832-mudlark-isolated/docs/refactoring-plans/oo-method-migration/oo-method-migration.md

I would also like to review how the package handles errors. I see many panics, asserts, debug_ssserts, etc, and in many cases I think it would be better to return a "Result" type. One example, the get function in the arena;

https://github.com/torrust/torrust-index/pull/832/changes#diff-740895d2aae80f3c35d9854eea374f1d02cac113dd0499bda803957605e82243R99-R109

I've added a new doc with the main differences between your package and my fork:

https://github.com/josecelano-bot/torrust-index/blob/review/pr-832-mudlark-isolated/docs/mudlark-fork-differences.md

da2ce7 · 2026-04-01T10:52:25Z

Hello @josecelano

Thank you for your great work in refactoring and reorganizing the Mudlark package. However as far as I can see; this work dose not need to be included in the 1.0.0 release of the package.

There are only a few small points of polish (a small revision to the https://github.com/da2ce7/torrust-index/blob/20260305_mudlark/packages/mudlark/LINKING-EXCEPTION , stating that it should only apply for this version of the API document, and if the document is expanded, the linking exception should be removed...)

and other polish things to do. Otherwise I would like the current form to merged if no other problems are found. :)

josecelano · 2026-04-01T11:07:27Z

Hello @josecelano

Thank you for your great work in refactoring and reorganizing the Mudlark package. However as far as I can see; this work dose not need to be included in the 1.0.0 release of the package.

There are only a few small points of polish (a small revision to the https://github.com/da2ce7/torrust-index/blob/20260305_mudlark/packages/mudlark/LINKING-EXCEPTION , stating that it should only apply for this version of the API document, and if the document is expanded, the linking exception should be removed...)

and other polish things to do. Otherwise I would like the current form to merged if no other problems are found. :)

OK @da2ce7, most of them are internal changes, the only concern would be about how the package exposes the errors in the public API. I would review panics, asserts, and unwraps, at least the ones that can be triggered but unintentionally bad use by the end users. Maybe you can publish the package with version 0.1.0 while we are still the only end users and move it to 1.0.0 once we have tested it in production in the index/tracker/etc.

Copilot AI review requested due to automatic review settings March 12, 2026 09:48

da2ce7 temporarily deployed to coverage March 12, 2026 09:48 — with GitHub Actions Inactive

This comment was marked as off-topic.

Sign in to view

da2ce7 added the Needs Rebase Base Branch has Incompatibilities label Mar 18, 2026

da2ce7 force-pushed the 20260305_mudlark branch from 7e087d5 to a39965d Compare March 19, 2026 05:55

da2ce7 temporarily deployed to coverage March 19, 2026 05:55 — with GitHub Actions Inactive

da2ce7 removed the Needs Rebase Base Branch has Incompatibilities label Mar 19, 2026

da2ce7 force-pushed the 20260305_mudlark branch from a39965d to bada211 Compare March 19, 2026 06:34

da2ce7 temporarily deployed to coverage March 19, 2026 06:34 — with GitHub Actions Inactive

da2ce7 temporarily deployed to coverage March 20, 2026 07:22 — with GitHub Actions Inactive

da2ce7 temporarily deployed to coverage March 21, 2026 07:34 — with GitHub Actions Inactive

da2ce7 force-pushed the 20260305_mudlark branch from 2dc4aa8 to 5d8e052 Compare March 21, 2026 07:41

da2ce7 temporarily deployed to coverage March 21, 2026 07:41 — with GitHub Actions Inactive

da2ce7 force-pushed the 20260305_mudlark branch from 5d8e052 to c0a8d85 Compare March 21, 2026 08:37

da2ce7 requested a review from a team as a code owner March 21, 2026 08:37

da2ce7 temporarily deployed to coverage March 21, 2026 08:38 — with GitHub Actions Inactive

da2ce7 added the Needs Rebase Base Branch has Incompatibilities label Mar 21, 2026

da2ce7 force-pushed the 20260305_mudlark branch from c0a8d85 to 0aad12a Compare March 23, 2026 07:59

da2ce7 temporarily deployed to coverage March 23, 2026 07:59 — with GitHub Actions Inactive

da2ce7 removed the Needs Rebase Base Branch has Incompatibilities label Mar 23, 2026

da2ce7 added 2 commits March 23, 2026 09:06

feat(package): mudlark, a φ-bounded geometric-value graph

3defaac

Introduce the new sub-package called "mudlark", named after the Magpie-Lark. Path: "packages/mudlark"

da2ce7 force-pushed the 20260305_mudlark branch from 0aad12a to 5227ea5 Compare March 23, 2026 08:06

da2ce7 temporarily deployed to coverage March 23, 2026 08:07 — with GitHub Actions Inactive

josecelano added a commit to josecelano/torrust-index that referenced this pull request Mar 24, 2026

docs(review): record Cameron's responses to PR torrust#832 findings

5084273

da2ce7 mentioned this pull request Mar 25, 2026

20260219 sentinel #827

Draft

da2ce7 temporarily deployed to coverage March 27, 2026 11:44 — with GitHub Actions Inactive

da2ce7 force-pushed the 20260305_mudlark branch from d773dd6 to b358f6a Compare March 27, 2026 11:52

da2ce7 temporarily deployed to coverage March 27, 2026 11:52 — with GitHub Actions Inactive

Conversation

da2ce7 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Contents

Changes outside mudlark

Reviewing this

Notes

Uh oh!

This comment was marked as off-topic.

Uh oh!

codecov bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

josecelano commented Mar 16, 2026

Uh oh!

josecelano commented Mar 17, 2026

Review notes — @josecelano

What I've produced

Known issues from the code review (Phases 1–4)

Mutation testing

Next steps (my side)

Uh oh!

da2ce7 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josecelano commented Mar 19, 2026

Uh oh!

da2ce7 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

da2ce7 commented Mar 21, 2026

Uh oh!

josecelano commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Component Architecture (module dependency layers)

Execution / Call Flow

Uh oh!

josecelano commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Finding 10 — Arena Stale-Handle ABA Problem

Uh oh!

josecelano commented Mar 25, 2026

Finding #11 — range_sum approximation not clearly documented

What proptest found

Root cause

Suggested documentation improvement

Uh oh!

josecelano commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Mutation Testing Re-run — 2026-03-26

Result

Remaining 72 missed mutants

graph_plateau.rs — 28 mutants

diagnostic.rs — 12 mutants

decay.rs — 9 mutants

evict.rs — 4 mutants

pewei.rs, graph.rs, arena.rs, graph_budget.rs, observe.rs, rebalance.rs — 19 mutants

Timeout (1)

Summary Assessment

Uh oh!

josecelano commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Isolated-Branch Experiments — Review Supplement

Experiments at a glance

Things that might interest you

Uh oh!

josecelano commented Mar 30, 2026

Uh oh!

josecelano commented Mar 31, 2026

Uh oh!

josecelano commented Apr 1, 2026

Uh oh!

da2ce7 commented Apr 1, 2026

Uh oh!

josecelano commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

da2ce7 commented Mar 12, 2026 •

edited

Loading

codecov bot commented Mar 12, 2026 •

edited

Loading

da2ce7 commented Mar 19, 2026 •

edited

Loading

da2ce7 commented Mar 20, 2026 •

edited

Loading

josecelano commented Mar 24, 2026 •

edited

Loading

josecelano commented Mar 25, 2026 •

edited

Loading

Finding #11 — `range_sum` approximation not clearly documented

josecelano commented Mar 26, 2026 •

edited

Loading

`graph_plateau.rs` — 28 mutants

`diagnostic.rs` — 12 mutants

`decay.rs` — 9 mutants

`evict.rs` — 4 mutants

`pewei.rs`, `graph.rs`, `arena.rs`, `graph_budget.rs`, `observe.rs`, `rebalance.rs` — 19 mutants

josecelano commented Mar 26, 2026 •

edited

Loading