Skip to content

Introduce Mudlark — an adaptive dual-tree spatial index#832

Open
da2ce7 wants to merge 4 commits intotorrust:developfrom
da2ce7:20260305_mudlark
Open

Introduce Mudlark — an adaptive dual-tree spatial index#832
da2ce7 wants to merge 4 commits intotorrust:developfrom
da2ce7:20260305_mudlark

Conversation

@da2ce7
Copy link
Copy Markdown
Contributor

@da2ce7 da2ce7 commented Mar 12, 2026

The Magpie-lark is a small Australian bird. Common, unassuming, often spotted fossicking in the mud — often affectionately called a "mudlark". What makes it distinctive is that mated pairs sing together in precisely coordinated duets, each bird contributing its part to a shared song.

That's the metaphor behind this crate's name. Mudlark is built around two trees that work in concert: one handles spatial geometry, the other handles significance ranking. They each have their own form and uses, and when they co-operate, they duet.


Summary

This PR adds mudlark to the workspace: a library crate shipping a single core type, GvGraph, that acts as an adaptive spatial index over a bounded domain. You feed it observations — a coordinate and a value — and it figures out where to pay attention. Regions that receive lots of activity earn finer resolution. Quiet regions stay coarse. As the distribution shifts, the structure re-evaluates, promoting what's become important and shedding what's gone cold.

The simplest way to think about it: a histogram that decides its own bin widths, continuously, based on what it's seen.

The spec calls it a φ-Bounded Geometric-Value Graph — φ (the golden ratio) because the depth bound follows a Fibonacci recurrence, bounded because the node budget is hard-capped, geometric for the spatial partition tree, value for the significance tournament. GvGraph in the code.

Most adaptive structures make one tree do everything — spatial partitioning, importance ranking, rebalancing — all tangled together. Mudlark separates those concerns cleanly. The G-Tree (Geometric) is a binary partition of the coordinate space. It knows about intervals, accumulated values, and routing. The V-Tree (Value) is a competitive tournament that ranks entries by significance. The two share the same underlying nodes but see them from completely different angles — one spatial, one evaluative. Neither alone is sufficient. Together they produce an index whose resolution tracks the data's actual structure, with formal efficiency guarantees rooted in information theory.

The everyday surface is straightforward: observe into it, query it (point lookups, range sums, O(1) contour projection), sample from it proportionally, extract significance-ordered snapshots, and apply temporal decay so it tracks the present rather than accumulating the past forever. The sampling cost adapts to the distribution — hotspots are cheap, uniform distributions cost more — and the bound is provably within 1.44× of the information-theoretic optimum.

This data structure is general and should find use in future developments.

Contents

The addition is substantial (~79k lines) but almost entirely self-contained within mudlark.

  • 37 source modules with a clean internal/external boundary (pub(crate) for internals, flat re-exports for the public API) and #![forbid(unsafe_code)]
  • 1 311 tests — 651 unit, 543 integration, 117 doc-tests — including adversarial stress patterns, a full invariant checker, and two narrative pedagogy tests that walk through every public operation step by step
  • 143 Criterion benchmarks across six families with published scaling exponents
  • ~10k-line formal specification (idea.md) developing the structure from first principles in coding theory — the implementation references it throughout via §IDEA N annotations
  • 38 architecture decision records documenting every significant design choice, including a mutation-testing gap analysis (ADR-M-039)
  • README with runnable examples — every code block in the Quick Start and Advanced Usage sections compiles as a doc-test
  • Companion documentation — API reference, architecture guide, performance analysis, testing guide, and standalone theory document
  • AGPL-3.0-only with Linking Exception — a formal LINKING-EXCEPTION file permits unmodified use through the public API (Surfaces 1 and 2) without copyleft obligations on downstream code

Changes outside mudlark

  • Cargo.toml / Cargo.lock — workspace member added, lock file regenerated
  • project-words.txt — spellcheck dictionary entries for mudlark terminology

Reviewing this

The public API is defined in docs/api.md and surfaced through the flat re-exports in lib.rs — that's the best starting point. The formal spec (docs/idea.md) is reference material; you don't need to read it to review the code, but the source cross-references it via §IDEA N annotations if you want to trace a design choice back to its rationale.

For a focused pass: lib.rs (crate root and re-exports) → src/graph.rs (the main type) → src/traits/ (the public trait surface) → tests (integration tests exercise only the public API).

The two pedagogy tests (tests/pedagogy.rs and tests/pedagogy_advanced.rs) are designed to be read end-to-end — they walk through the worked example from the spec, printing a teaching narrative at each step. Running cargo test -p torrust-mudlark --test pedagogy -- --nocapture produces a readable tutorial.

Notes

  • Ships at 1.0.0. The public API surface is defined, documented, and semver-locked. Config and GvGraph have no Default impl because the parameters are genuinely domain-dependent. The API surface is intentionally narrow and documented in docs/api.md.
  • The formal spec is the single source of truth for the design — the ADRs are the full implementation notes.
  • AGPL-3.0-only with a formal LINKING-EXCEPTION: using the public API (Surfaces 1 and 2) from your own code doesn't trigger copyleft on downstream code. Modifications or use of internal APIs (Surface 3) are subject to the full AGPL.
  • MSRV 1.85, inherited from the workspace.
  • No unsafe code. Forbidden at the crate root.

Copilot AI review requested due to automatic review settings March 12, 2026 09:48

This comment was marked as off-topic.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.64%. Comparing base (6749e87) to head (b358f6a).

Additional details and impacted files
@@             Coverage Diff              @@
##           develop     #832       +/-   ##
============================================
+ Coverage    52.20%   67.64%   +15.44%     
============================================
  Files          116      153       +37     
  Lines         6166    12148     +5982     
  Branches      6166    12148     +5982     
============================================
+ Hits          3219     8218     +4999     
- Misses        2866     3681      +815     
- Partials        81      249      +168     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@josecelano
Copy link
Copy Markdown
Member

@da2ce7 why text-to-png dependency has been removed? The image proxy displayed an error message (as an image) when the source image was unavailable.

josecelano added a commit to josecelano/torrust-index that referenced this pull request Mar 17, 2026
…on testing results

- MUDLARK_EXPLAINED.md: accessible 14-section explainer of the dual-tree design
- GLOSSARY.md: authoritative term definitions with naming rationale
- USE_CASES.md: 5 concrete operational problems with fit assessments
- COORDINATE_ENGINEERING.md: 6 encoding techniques for non-integer domains
- DESIGN_ALTERNATIVES.md: 8 alternative data structures compared to mudlark
- VERIFICATION_STRATEGY.md: black-box verification plan for AI-assisted PR
- REVIEW_PR_832.md: phased review notes (phases 1–4 complete)
- mutants-results/: cargo-mutants run over torrust-mudlark
  - 1631 mutants tested, 621 caught, 413 missed, kill rate 60.1%
  - missed.txt contains all 413 uncaught mutations for author to review
- reference_comparator.rs: naive O(N) reference impl to compare against mudlark
@josecelano
Copy link
Copy Markdown
Member

Review notes — @josecelano

Hi @da2ce7, I've been reviewing this PR. Given the size (~64k lines, AI-assisted), I took a different approach from a traditional line-by-line review. Here's what I've done and what I found.


What I've produced

All review documents are in a branch on my fork:
josecelano/torrust-index — review/pr-832-mudlark

Document Purpose
MUDLARK_EXPLAINED.md Accessible explainer of the dual-tree design for non-specialists
GLOSSARY.md Authoritative definitions for every mudlark-specific term
USE_CASES.md 5 concrete operational problems with fit assessments
COORDINATE_ENGINEERING.md Techniques for encoding non-integer domains into coordinates
DESIGN_ALTERNATIVES.md 8 alternative data structures compared to mudlark (including the sparse segment tree relationship)
VERIFICATION_STRATEGY.md Black-box verification strategy for an AI-assisted PR of this scale
REVIEW_PR_832.md Phased review notes — phases 1–4 complete, phases 5–7 in progress

Known issues from the code review (Phases 1–4)

These are concrete, actionable items regardless of correctness of the implementation:

  1. MSRV bump 1.72 → 1.80 touches the whole workspace, not just mudlark. Needs explicit justification / changelog entry.
  2. src/ui/proxy.rs regression — error image rendering is silently broken. map_error_to_image() returns Bytes::new() for all errors. The text-to-png-rendering feature flag is defined but always panics if enabled. This is a user-visible regression, not just a build fix.
  3. torrust-sentinel in machete ignore list — this crate doesn't exist in the workspace yet. The forward reference needs a comment explaining the intent.
  4. docs/api.md is out of date — several pub methods are undocumented: v_root(), depth_buffer(), headroom(), soft_limit(), is_ancestor_of(), build_plateaus(). The Node struct is also missing the gnode_id field added by ADR-M-036.
  5. debug_plateau_basis() should be #[doc(hidden)] or #[cfg(test)] — the doc-comment itself says it "exposes pub(crate) basis bookkeeping for integration tests", which means it's not a stable API surface.
  6. AGENTS.md scope — mudlark-specific cross-reference conventions (§IDEA M-N, §API M-N, etc.) were added to the root AGENTS.md which applies to the whole repo. Consider moving them to packages/mudlark/AGENTS.md.

Mutation testing

I ran cargo-mutants on torrust-mudlark (1631 mutants):

Outcome Count
Caught (tests detected the break) 621
Missed (tests did NOT detect the break) 413
Unviable (wouldn't compile) 597
Kill rate 60.1%

A 60% kill rate means ~40% of deliberate code breaks went undetected by the 833 tests. This is below what I'd expect for a library with formal invariant documentation. The full list of 413 missed mutations is in:

docs/pr-reviews/832/mutants-results/mutants.out/missed.txt

Notable patterns in the missed list:

  • Arithmetic mutations in contour_range.rs::compute_plateau_energy (plateau energy computation)
  • Arithmetic mutations in pewei.rs::reconstruct (PEWEI reconstruction)
  • Operator mutations in Config::validate (parameter validation guards)
  • Operator mutations in Arena::alloc (arena allocation arithmetic)
  • Several comparison operator flips in testing/plan.rs (test helper internals — lower priority)

This doesn't necessarily mean the implementation is wrong, but it does mean the tests are not verifying the numeric outputs of these functions precisely enough. I'd recommend reviewing the missed mutations, particularly in the non-test modules, and adding targeted assertions.


Next steps (my side)

I'm working on a naive O(N) reference implementation (packages/mudlark/tests/reference_comparator.rs) that runs the same observations through a simple Vec-backed structure and compares range_sum outputs against mudlark. This is a black-box correctness check — if they diverge, there's a real bug with a concrete reproducer.

I'll update this comment when that's complete.


The design intent of mudlark is clear and well-documented. The dual-tree architecture is well-motivated, the public API is well-structured, and cargo test passes. The items above are what I'd want resolved before merge.

@da2ce7 da2ce7 added the Needs Rebase Base Branch has Incompatibilities label Mar 18, 2026
@da2ce7 da2ce7 force-pushed the 20260305_mudlark branch from 7e087d5 to a39965d Compare March 19, 2026 05:55
@da2ce7 da2ce7 removed the Needs Rebase Base Branch has Incompatibilities label Mar 19, 2026
@da2ce7
Copy link
Copy Markdown
Contributor Author

da2ce7 commented Mar 19, 2026

Hello @josecelano , Thank You for your time in reviewing this pull request.

I did address the Non-Mudlark issues in:

Now merged, and this pull request has been rebased.

Now I will first focus on the non-substantive issues, i.e. test counts being out of sync, and the api.md not being complete.

Then after I will spend some time on test coverage that the mutation-probing has shown to be insufficient.

Kind regards,

Cam.

@josecelano
Copy link
Copy Markdown
Member

Hello @josecelano , Thank You for your time in reviewing this pull request.

I did address the Non-Mudlark issues in:

Now merged, and this pull request has been rebased.

Now I will first focus on the non-substantive issues, i.e. test counts being out of sync, and the api.md not being complete.

Then after I will spend some time on test coverage that the mutation-probing has shown to be insufficient.

Kind regards,

Cam.

Hi @da2ce7 , I haven't finished the review. I will continue next week. It's being very interesting (I've learn many new things) and I still have many questions and I want to get a deeper understanding.

@da2ce7
Copy link
Copy Markdown
Contributor Author

da2ce7 commented Mar 20, 2026

Hello @josecelano upon analysis of your comment:

  1. docs/api.md is out of date — several pub methods are undocumented: v_root(), depth_buffer(), headroom(), soft_limit(), is_ancestor_of(), build_plateaus(). The Node struct is also missing the gnode_id field added by ADR-M-036.

I did a propper audit of the Public Facing API for v1.0, the existing ADR-032 couldn't answer the questions that actually mattered.

The problem was concrete. I needed to decide: should Node carry a parent handle? The three-surface taxonomy said "Surface 1 is for Prints — lightweight view types." But parent points back into the graph. Is that a Print or an Emulsion concern? The old ADR had no way to decide — it was a classification scheme, not a decision procedure. The same ambiguity blocked every other API question: is VNodeId public? Should GNodeInfo exist as a separate type from Node? Does Observation::scale() belong on every observation or only the ones where scaling is meaningful?

Each of those questions had a defensible answer, but each one felt ad hoc. The taxonomy named the surfaces; it didn't say how to assign things to them. So instead of making five judgment calls, I went back to ADR-032 and asked what the assignment criteria actually are.

Three tests fell out:

  • Primary test: does the symbol serve one of the six core operations (three projections, three mutations, or construction)? If it only serves diagnostics/serialisation/testing, it's #[doc(hidden)] or feature-gated.
  • Handle test: does a handle type have both a public producer and a public consumer? A handle with only one end is a dead-end — it promises a capability the crate doesn't deliver.
  • Snapshot test: does a Print field describe something fixed at creation, or does it track the graph's evolving state? Immutable provenance belongs on Surface 1; live topology belongs on Surface 3.

Once the tests exist, the API decisions become mechanical:

  • Node.parent passes the snapshot test — it's set once at split time and structurally immune to eviction. It belongs on Surface 1, so Node gets a parent field and upward traversal is available on the stable API.
  • VNodeId fails the handle test — v_root() produces it, but nothing public consumes it. Both become pub(crate).
  • GNodeInfo was a separate type for what Node already is. With the tests in hand, there's no reason for two types — gnode_info() returns Node.
  • Observation::scale() was a panicking default on types where scaling is meaningless. Split into ScalableObservation sub-trait — same-type observations no longer carry an unusable method.
  • Child linkage (left/right) fails the snapshot test — mutable topology. Hidden behind #[doc(hidden)] gnode_children().

The analogy section got similar scrutiny: every mapping now carries an explicit fidelity tag (load-bearing / illustrative / decorative / breaks) so future decisions can lean on the parts of the metaphor that do real design work and leave the rest alone.

Separately, and found during this pass: the decay factor table was sized assuming tree depth ≤ N, which doesn't hold for f64 coordinates. That's a real panic. ADR-M-038 documents the fix, and ~770 lines of new integration tests lock it down along with several IEEE 754 edge cases (0 * ∞, ln(0) * 0, ∞ - ∞ producing NaN in various code paths).

Now the API document is correct for v1 (at least I hope so!), and the choices are both much more defensible and maintainable.

@da2ce7
Copy link
Copy Markdown
Contributor Author

da2ce7 commented Mar 21, 2026

Hello @josecelano ,

I took the time to overhaul the test suite after your mutation testing found the package lacking — the results were genuinely useful. The surviving mutants clustered into clear categories (value-correctness, boundary-precision, type-coverage, and invariant checker self-consistency), so I wrote up the full analysis as ADR-M-039 and then closed the gaps.

Test count: 868 → 1 311 (+443 tests across all three surfaces).

Two new tests are worth reading end-to-end — they're designed as teaching material, not just assertions:

  • pedagogy.rs — walks through the worked example from §IDEA M-16 step by step: bootstrap split, catalytic split, competitive promotion, the uncle shield, PEWEI extraction, proportional sampling, targeted subtree decay, and post-decay rebalancing. Every assertion maps to a spec claim; every step prints a narrative explaining what happened and why.

  • pedagogy_advanced.rs — companion covering the read surface: construction alternatives (from_observations, Extend, Clone), G-node introspection, range sum pro-ration subtleties, the full contour-range query pipeline, plateau selection, progressive PEWEI reconstruction, and budget-limited eviction. Eighteen steps on a shared graph, printing a continuous narrative.

Both use harness = false so the steps execute in sequence — running cargo test -p torrust-mudlark --test pedagogy -- --nocapture produces a readable walkthrough.

The README has also been rewritten to take advantage of these tests. The Quick Start and Advanced Usage sections are now runnable doc-tests (compiled via include_str! in lib.rs), so every claim in the README is continuously verified. The pedagogy tests assert the same scenarios with full invariant checks — if the prose drifts from reality, CI catches it.

I'll be interested to see how the mutation testing results change with this revision.

@da2ce7 da2ce7 force-pushed the 20260305_mudlark branch from 5d8e052 to c0a8d85 Compare March 21, 2026 08:37
@da2ce7 da2ce7 requested a review from a team as a code owner March 21, 2026 08:37
@da2ce7 da2ce7 added the Needs Rebase Base Branch has Incompatibilities label Mar 21, 2026
@da2ce7 da2ce7 force-pushed the 20260305_mudlark branch from c0a8d85 to 0aad12a Compare March 23, 2026 07:59
@da2ce7 da2ce7 removed the Needs Rebase Base Branch has Incompatibilities label Mar 23, 2026
da2ce7 added 2 commits March 23, 2026 09:06
Introduce the new sub-package called "mudlark", named after the Magpie-Lark.

Path: "packages/mudlark"
BREAKING CHANGES:

- Replace `GNodeInfo<C, V>` with `Node<C, V>` — `gnode_info()` now
  returns the same Surface 1 view type used by `layers()`. A new
  `#[doc(hidden)] GNodeChildren` covers the child-linkage use case.
- Demote `VNodeId` to `pub(crate)` — no public consuming method
  exists (ADR-M-032 handle test).
- Demote `v_root()` to `pub(crate)`.
- Mark `GNodeId::from_index()` and `GNodeId::index()` as
  `#[doc(hidden)]`.
- Split `Observation::scale()` (panicking default) into a separate
  `ScalableObservation<V>` sub-trait. Cross-type impls (f64→uint,
  f32→uint, f64→f32) implement both; same-type impls no longer
  carry an unusable `scale()`.
- Relax `select_plateaus` bound from `V: Proratable + Inspectable`
  to `V: Inspectable`.
- Demote `serde` from a default feature to opt-in.
- `Cell` now represents the uncovered interval (terminal or vacated
  semi-internal half), not always the full G-node range. `sample()`
  returns the narrowed interval for semi-internals.

Bug fixes:

- Fix decay factor-table index-out-of-bounds panic for f64
  coordinates where G-tree depth can exceed N (ADR-M-038). The
  factor table is now sized from the actual max depth within the
  subtree, not from `N - d_root`.
- Allow `attenuation = 0.0` in `decay()` (annihilation, §THEORY
  M-7.3) and handle `attenuation = ∞` (infinite amplification)
  without NaN from `ln(0) * 0` or `ln(∞) * 0`.
- Guard `Attenuatable::attenuate()` for f32/f64 against IEEE 754
  `0 * ∞ = NaN` — if either operand is zero, return zero.
- Fix invariant checkers and debug assertions to short-circuit
  `∞ == ∞` before computing `∞ - ∞ = NaN`.

Improvements:

- Add `Node::parent` field and `Node::is_root()` method.
- Add `debug_assert!` P2 guard in `observe()` for negative
  accumulations (ADR-M-033).
- Move `select_plateaus` from graph_query.rs to graph_plateau.rs.
- Relax `Pewei` method bounds: `layer_count()`, `node_count()`,
  `reconstruct_all()` require only `Accumulator`, not `Proratable`.
- Correct sampling cost from O(1.44 H + 1.67) to O(1.44 H).
- Mark `build_plateaus()` and `build_plateau_basis()` as
  `#[doc(hidden)]`.

Documentation:

- Rewrite ADR-M-032 with testable surface-assignment criteria and
  expanded analogy tables.
- Add ADR-M-038 (decay factor-table depth bound).
- Significantly expand docs/api.md with Cell/Node/contour-range
  semantics, cross-references, and worked examples.
- Update idea.md, architecture.md, performance.md, testing.md.

Tests:

- Add `tests/decay_infinite.rs` — infinite amplification, ADR-M-038
  regression, and annihilation edge cases (480 lines).
- Add `tests/negative_f64.rs` — P2 violation guard and downstream
  semantic breakage tests (290 lines).
- Update test count: 833 → 868 (346 unit, 415 integration,
  107 doc-tests).
- Update CHANGELOG release date to 2026-03-20.
…examples

A cargo-mutants run produced ~305 surviving mutants, revealing
systematic blind spots in value-correctness, boundary-precision,
and type-coverage testing.  ADR-M-039 documents the analysis and
classifies survivors into seven mutant classes; this commit closes
the gaps it identified.

Test count: 868 → 1 311 (+443 tests across all three surfaces).

New crate-level test modules:
  - config_validation: rejects invalid Config combinations
  - decompose_basis: value-exact basis-element assertions
  - invariant_self: checker self-consistency (kills class-1 mutants)

New integration tests:
  - pedagogy: 10-step narrative walkthrough of the worked example
    from §IDEA M-16 — bootstrap split, catalytic split, skip-
    promote, uncle shield, PEWEI extraction, proportional sampling,
    subtree decay, and post-decay rebalancing.  Every assertion
    maps to a spec claim; every step prints a teaching narrative.
  - pedagogy_advanced: 18-step companion covering the read surface
    — construction alternatives (from_observations, Extend, Clone),
    gnode_info / is_ancestor_of introspection, range_sum pro-ration
    subtleties, full contour-range query pipeline (endpoint lattice,
    basis consolidation, energy-field algebraic identity,
    energy-only variant, invalid-endpoint rejection), plateau
    selection with outward snap, progressive PEWEI reconstruction
    with truncation, and budget-limited eviction under a tight
    budget=10 ceiling.
  - eviction_diagnostic: step-by-step diagnostic replay (replaces
    eviction_debug)
  - eviction_plateau: plateau integrity through eviction cycles
    (replaces eviction_p_i2)

Both pedagogy tests use `harness = false` (custom main) so the
steps execute in sequence on a shared graph, printing a continuous
narrative that can be read alongside the formal spec.

Removed (superseded):
  - eviction_debug → eviction_diagnostic
  - eviction_p_i2 → eviction_plateau

Every existing test file receives a doc-header index table per the
new AGENTS.md "Test Doc-Headers" convention, plus targeted new tests
to kill mutation-testing survivors.

README rewrite:
  - New "Choose GvGraph" section (when to use, when not to)
  - Quick Start rebuilt around a minimal N=3 graph with five
    canonical observations; each section is a runnable doc-test
    (compiled via `include_str!` in lib.rs)
  - New sections: adaptive splitting with ASCII contour diagrams,
    entropy-adaptive sampling, targeted subtree decay, advanced
    usage (checkpoint/fork, gnode_info, contour-range decomposition,
    progressive PEWEI reconstruction, budget-constrained operation)
  - "Limitations" section (1D only, f64 cost, single-writer, no
    signed accumulators)
  - Performance table softened to "typical latencies" with caveat
    that absolute numbers vary by hardware
  - Cross-reference conventions reformatted as a tag→document table
  - License section updated for the new LINKING-EXCEPTION

LINKING-EXCEPTION:
  AGPL-3.0-only linking exception (v1.0) permitting unmodified use
  through Surfaces 1 and 2 without copyleft obligations on the
  consumer's code.  Surface 3 (pub(crate) / #[doc(hidden)]) remains
  fully AGPL.  Explicitly notes the Legacy Exception does not apply
  to this package.

Other changes:
  - Cell / api.md docs expanded: sample() can land on internal
    G-nodes whose V-entry carries frozen pre-split intensity
    (ADR-M-019, §IDEA M-6.5)
  - GNode::Internal doc-comment expanded with the same detail
  - Config<V> derives PartialEq (needed by config_validation and
    the pedagogy test's README_CONFIG == worked_example_config guard)
  - Testing presets: no_split_config, f64_no_split_config,
    decay_config, contour_range_config, plan_multi_plateau
  - Testing runners: run_diagnostic, run_diagnostic_budgeted
    (step-by-step invariant-checking replay with full diagnostic
    dump on first violation)
  - ADR tables reformatted for consistent column alignment
  - AGENTS.md: "Test Doc-Headers" convention added, table and
    prose refinements
  - CHANGELOG date corrected to 2026-03-21; test counts updated
  - rand 0.10 added to dev-dependencies
  - updated `project-words.txt`
@da2ce7 da2ce7 force-pushed the 20260305_mudlark branch from 0aad12a to 5227ea5 Compare March 23, 2026 08:06
josecelano added a commit to josecelano/torrust-index that referenced this pull request Mar 24, 2026
…on testing results

- MUDLARK_EXPLAINED.md: accessible 14-section explainer of the dual-tree design
- GLOSSARY.md: authoritative term definitions with naming rationale
- USE_CASES.md: 5 concrete operational problems with fit assessments
- COORDINATE_ENGINEERING.md: 6 encoding techniques for non-integer domains
- DESIGN_ALTERNATIVES.md: 8 alternative data structures compared to mudlark
- VERIFICATION_STRATEGY.md: black-box verification plan for AI-assisted PR
- REVIEW_PR_832.md: phased review notes (phases 1–4 complete)
- mutants-results/: cargo-mutants run over torrust-mudlark
  - 1631 mutants tested, 621 caught, 413 missed, kill rate 60.1%
  - missed.txt contains all 413 uncaught mutations for author to review
- reference_comparator.rs: naive O(N) reference impl to compare against mudlark
josecelano added a commit to josecelano/torrust-index that referenced this pull request Mar 24, 2026
@josecelano
Copy link
Copy Markdown
Member

josecelano commented Mar 24, 2026

Component Architecture (module dependency layers)

component-architecture

Execution / Call Flow

call-flow

cc @da2ce7

NOTE: In the Brave browser, the diagram lines are not displayed when you click the SVG file. Working fine on Chrome and Firefox.

@da2ce7 da2ce7 mentioned this pull request Mar 25, 2026
@josecelano
Copy link
Copy Markdown
Member

josecelano commented Mar 25, 2026

Finding 10 — Arena Stale-Handle ABA Problem

@da2ce7 While reviewing arena.rs I noticed that Arena<T> has no generation tracking on its free list. When a slot is deallocated and later recycled for a new node, any GNodeId that was issued for the old occupant will silently resolve to the new one.

The is_occupied() guard in gnode_info(), gnode_children(), and is_ancestor_of() only checks whether some node occupies the slot — it cannot distinguish "original node" from "a different node that reused the same slot".

Concrete scenario:

  1. User calls plateaus() and caches the resulting GNodeId values (e.g. in a HashMap<GNodeId, MyState>).
  2. User calls observe(...) — eviction frees some of those G-nodes.
  3. A subsequent split recycles the freed slots for new G-nodes.
  4. User calls gnode_info(cached_id)is_occupied() returns true and returns Some(Node) for the new occupant, not the original. No error is signalled.

This is the classic ABA problem. No memory-safety issue (Rust prevents that), but it can silently corrupt a caller's view of the tree.

The core question is: is caching a GNodeId across structural mutations a supported use case?

  • If no (ephemeral only): the contract should be documented. The is_occupied() guard in gnode_info() is then misleadingly protective — it appears to return None for dead handles but cannot do so once a slot is reused.
  • If yes (stable across mutations): a generation counter is needed (either per-slot on Arena, or an epoch on GvGraph).

Full write-up with fix options and a comparison table (custom arena vs generational-arena) is in the review branch:

docs/pr-reviews/832/findings/finding-10-arena-stale-handle.md

josecelano added a commit to josecelano/torrust-index that referenced this pull request Mar 25, 2026
torrust#832)

- Add packages/mudlark/tests/proptest_harness.rs: 7 property-based tests
  (P1 total_sum conservation, P2 range_sum near-additivity, P3 range_sum
  monotonicity, P4 structural invariants, P5 plateau coverage, P6
  annihilation, P7 budget enforcement). All 7 pass.
- Add proptest = "1" dev-dependency and [[test]] entry to mudlark Cargo.toml.
- Add docs/pr-reviews/832/PROPTEST_STRATEGY.md explaining the oracle problem,
  reference model, properties checked, and development findings.
- Update docs/pr-reviews/832/VERIFICATION_STRATEGY.md: add Step 6 row.
- Add docs/pr-reviews/832/proptest-results/run-2026-03-25.txt and regression
  seeds: two proptest counterexamples discovered and resolved during harness
  development (F-P2 range_sum approximation, F-P7 budget headroom).
@josecelano
Copy link
Copy Markdown
Member

Finding #11range_sum approximation not clearly documented

@da2ce7 — one item worth addressing before merge: range_sum returns an approximation, not an exact value, even for integer accumulators — and this isn't communicated prominently enough in the public API docs.

What proptest found

While running a property-based harness (7 tests, all passing — see docs/pr-reviews/832/proptest-results/run-2026-03-25.txt on the review branch), the initial property P2 range_sum(lo..m) + range_sum(m..hi) == range_sum(lo..hi) was immediately falsified by proptest with this minimal counterexample:

observe(0, 64)
range_sum(0..1) = 0   // expect: 64
range_sum(1..5) = 0
range_sum(0..5) = 1

So range_sum(0..1) returns 0 even though coord 0 holds 64 units of energy.

Root cause

range_sum_inner (ADR-M-020) pro-rates g.own by floor(own × overlap_width / node_width) whenever the query boundary bisects a G-node. Until the G-tree has built enough depth at coord 0 to give it its own fine leaf, all energy sits in the root covering [0, 256) and is scaled by 1/256. For small accumulator values or query windows narrower than the node, the result is 0.

This is by design — the library is a density estimator. But a user with an integer accumulator will reasonably write:

g.observe(42, 100);
assert_eq!(g.range_sum(42..43), 100);  // This FAILS

and receive no hint from the docs that this isn't supposed to work.

Suggested documentation improvement

The range_sum docstring currently uses assert!(g.range_sum(0..128) >= 3) (not ==), which hints at approximation but doesn't explain why. I'd suggest adding a short # Approximation section to the range_sum docstring, e.g.:

Approximation note: range_sum returns an estimate, not an exact count. Accumulated values are pro-rated using f64 arithmetic at G-nodes that are only partially covered by the query range (ADR-M-020). The result converges toward the true sum as the G-tree refines depth at the queried coordinates, but for integer accumulators, range_sum(x..x+1) will often return 0 until x has received enough observations to trigger a split.

The full proptest strategy and findings are documented in docs/pr-reviews/832/PROPTEST_STRATEGY.md on the review branch.

josecelano added a commit to josecelano/torrust-index that referenced this pull request Mar 25, 2026
…rust#11 (torrust#832)

- Fill in REVIEW_PR_832.md Phase 5 with findings from isolated branch
  analysis (review/pr-832-mudlark-isolated):
  - Reviewer-written integration tests (10) and insta snapshot tests (4)
  - Coverage results: 86.96% lines / 96.67% fn (436 tests)
  - Finding torrust#11: post-evict debug_assert fires in debug builds, blocks
    all eviction-gated test paths (root cause of ~87% coverage ceiling)
  - Cyclomatic complexity analysis via rust-code-analysis-cli
  - Mutation kill rate: 60.1% (621/1034), re-run needed after torrust#11 fixed
  - Module structure observation (flat vs grouped layout)
- Add findings/finding-11-post-evict-assert.md
- Update INDEX.md: Phase 5 done, Finding torrust#11 in table, expanded Open
  Items, updated Measurement Snapshots
josecelano added a commit to josecelano/torrust-index that referenced this pull request Mar 25, 2026
… approximation is torrust#11 (torrust#832)

- Finding torrust#11 = range_sum approximation not clearly documented (proptest P2,
  reported in PR comment torrust#832 (comment))
- Finding torrust#12 = post-evict debug_assert blocks eviction tests (was torrust#11)
- Rename finding-11-post-evict-assert.md -> finding-12-post-evict-assert.md
- Create finding-11-range-sum-approximation.md
- Update INDEX.md: findings table, open items, measurement snapshots reference
- Update REVIEW_PR_832.md Phase 5: all torrust#11 post-evict references -> torrust#12
@josecelano
Copy link
Copy Markdown
Member

josecelano commented Mar 26, 2026

Mutation Testing Re-run — 2026-03-26

@da2ce7 — here are the results of the re-run of cargo mutants -p torrust-mudlark against the current branch (1,311 tests).

Result

Metric Before (+443 tests) After
Viable mutants 1,034 1,077
Caught 621 1,004
Missed 413 72
Timeout 1
Unviable 627
Kill rate 60.1% 93.2%

The new test suite brings the kill rate well above the 80% threshold. Well done — that's a significant improvement.


Remaining 72 missed mutants

The surviving mutants are concentrated in five areas. I've grouped them below.

graph_plateau.rs — 28 mutants

The densest cluster. Most are in consolidate_all_basis, fixup_plateau, repair_p_i4, split_for_p_i4, and find_boundary_node. Notably:

  • consolidate_all_basis: loop counter / arithmetic (lines 1019, 1022) — operator flips (<<=, +=-=, -+) survive, suggesting the invariant tests don't exercise boundary iterations.
  • repair_p_i4 / split_for_p_i4: several arithmetic and comparison flips at lines 1227, 1241, 1244, 1292 — the P-I4 repair path may need a tighter boundary test.
  • find_boundary_node: >< / >= both survive (line 1328) — the boundary direction is not pinned by an assertion.
  • recompute_plateau / fixup_plateau: +- and +* survive at lines 405 and 1144.
  • debug_assert_plateau_mirror_consistency / debug_check_plateau_sums: mutations in the checker itself survive — this means the checker is not independently tested. A dedicated unit test that constructs a known-inconsistent state and asserts the checker fires would close these.

diagnostic.rs — 12 mutants

  • audit_plateau_consistency (lines 97, 106, 117, 124): four !=/== flips survive — the audit function itself is not verified with a state it should flag.
  • diagnose_missed_violation (lines 189, 255, 273, 274, 288, 298): match arm deletions and +=/-= survive — again the diagnostic logic has no independent tests; mutating it doesn't break any test.
  • Same pattern as graph_plateau.rs: the verifier needs its own verifier.

decay.rs — 9 mutants

  • decay_uniform (lines 179, 198): ! deletion and ||&& survive.
  • decay_selective (lines 230, 242, 258, 274, 279, 344, 363): several arithmetic and comparison mutations survive — boundary cases between uniform and selective decay, and the divisor in the stepping formula, are not directly asserted.

evict.rs — 4 mutants

  • evict_tip line 251: ! deletion survives.
  • plateau_after_evict lines 507, 513: !=== and ! deletion survive — note that this is Finding 12; the debug_assert that fires in debug builds is blocking these test paths. Once Finding 12 is resolved, these may become detectable.

pewei.rs, graph.rs, arena.rs, graph_budget.rs, observe.rs, rebalance.rs — 19 mutants

  • Pewei::reconstruct lines 344: +* in two positions — reconstruction arithmetic not verified with enough precision.
  • Config::validate lines 221–223: the validation boundary arithmetic (-, +, *) survives — Config validation likely isn't tested near the exact boundary values.
  • Arena::alloc line 59: / with * survives — Arena capacity arithmetic is probably not tested at the boundary.
  • GvGraph::adjust_depth_gates line 58: one arithmetic operator survives.
  • GvGraph::evict_candidates line 243: ||&& survives.
  • GvGraph::observe line 150: one arithmetic flip survives.
  • escalate_after_promote line 1117: &&|| survives.

Timeout (1)

packages/mudlark/src/traits/coordinate.rs:262:13: replace - with / in <impl Coordinate for f32>::width

Coordinate::width with / instead of - likely causes an infinite loop in geometry traversal. This is an acceptable outcome — the test would need an explicit timeout guard to kill it.


Summary Assessment

The 93.2% kill rate clears the 80% bar and this is no longer a blocker. The remaining 72 fall into two categories:

  1. Verifier self-consistency (diagnostic.rs, graph_plateau.rs checker functions) — the invariant-checking code has no independent tests that construct invalid states. These would be best addressed with a small set of unit tests that deliberately build a broken invariant and assert the checker fires.
  2. Boundary arithmetic in plateau consolidation, decay stepping, Config validation, and Arena alloc — these need tests at the exact boundary values or property-based tests that cover the boundary region.

Neither category is a blocker for merge, but the diagnostic/verifier gap is worth noting for long-term maintainability.

josecelano added a commit to josecelano/torrust-index that referenced this pull request Mar 26, 2026
…t#832)

- Re-run 2026-03-26 with Cameron's 1,311 tests (was 868)
- 1004 caught / 72 missed / 1 timeout / 627 unviable / 1077 viable
- Kill rate: 60.1% → 93.2% (well above 80% target)
- Update Phase 5 verdict, author responses, measurement snapshots, open items
- Also include the mutants-results/mutants.out output from the run
- PR comment posted: torrust#832 (comment)
@josecelano
Copy link
Copy Markdown
Member

josecelano commented Mar 26, 2026

Isolated-Branch Experiments — Review Supplement

Hi @da2ce7 — alongside the main review I've been running a parallel set of
experiments on an isolated copy of the mudlark package:

Branch: josecelano/torrust-index @ review/pr-832-mudlark-isolated

The idea: extract only the Rust source (no docs, no comments, no tests) into a
clean sandbox and use it to answer questions that would have added noise to the
main review. Full log: docs/pr-reviews/832/MUDLARK_ISOLATED_EXPERIMENTS.md

Experiments at a glance

# Experiment Key result
E-1 Code isolation + self-explanation ✅ Code is self-explanatory from types/naming alone
E-2 Architecture diagrams ✅ Component + call-flow SVGs generated
E-3 Module-restructure proposal ✅ Proposed grouped layout (nodes/spatial/tree/graph/diagnostics)
E-4 Integration-test safety net ✅ 10 end-to-end tests using invariant checker as oracle
E-5 Coverage baseline ✅ 60% lines before any change
E-6 IP-range ban-detection use case ✅ Runnable example + design doc
E-7 TUI visualiser ✅ Live G-tree / V-tree terminal UI
E-8 ANSI heat-map example ✅ /24 subnet density grid
E-9 Snapshot tests (insta) ✅ Regression guard for refactoring
E-10 Module-restructure execution ✅ Refactor in 5 sequential commits, all tests green
E-11 Unit-test coverage improvement ✅ 60% → 86.96% lines (96.67% functions)
E-12 Cyclomatic complexity analysis plateau_after_evict: CC=39, cognitive=95 — top hotspot
E-13 Tree-snapshot + Graphviz DOT export ✅ Visual diffing of tree structure
E-14 G-I5 bijection invariant ✅ New diagnostic: G-tree ↔ V-tree bijection check
E-15 Type-extraction refactor (P1–P8) Config<V>, GNodeChildren, ViolationSources, … into own files

Things that might interest you

  • E-1 confirms the code quality: stripped of all comments, an AI agent
    could reconstruct the dual-tree design, the contracts on observe/decay/
    sample, and the role of each module purely from types and names.
  • E-3/E-10/E-15 are proposed improvements rather than required fixes —
    happy to share them as separate PRs if they would be useful.
  • E-12 corroborates the complexity concerns in Finding 9 and 12 from the
    main review: plateau_after_evict (CC=39, cognitive complexity 95) is the
    complexity hot-spot.
  • E-11 pushed line coverage to ~87%; the remaining gap is almost entirely
    branches in invariants.rs and plateau.rs that can only fire when the data
    structure is in an invalid state — which connects directly to Finding 12.

The G-node arena reuses slots via a free list. A handle obtained
before an eviction could silently bind to a new, unrelated node
after the slot is reallocated (ABA problem). This is not
hypothetical — the sentinel stores handles across observation
cycles where evict-then-split sequences can recycle slots.

Introduce a two-tier handle design:

- GNodeId (pub, 8 bytes): slot index + generation counter.
  Used on all public API surfaces. Consuming methods
  (decay, gnode_info, is_ancestor_of) validate the generation
  and panic on mismatch.

- GSlotPointer (pub(crate), 4 bytes): generation-free handle
  for internal tree walks where no interleaving eviction can
  occur. Renamed from the former GNodeId.

- VSlotPointer (pub(crate), 4 bytes): renamed from VNodeId
  for consistency with the new naming convention.

Arena changes:
- alloc() returns (index, generation) tuple
- dealloc() increments the per-slot generation counter
- New generation() accessor for handle validation

The generation Vec is cold data — touched only on alloc/dealloc,
never on hot read/write paths. Internal hot paths continue to use
4-byte GSlotPointer; the 8-byte GNodeId exists only at the API
boundary.
@josecelano
Copy link
Copy Markdown
Member

Hi @da2ce7, I'm still working on the review. I'm refactoring the package to see how far I can get in reducing accidental complexity. I'm working in this orphan branch only with the package.

https://github.com/josecelano-bot/torrust-index/tree/review/pr-832-mudlark-isolated

I think there are already a couple og good abstractions like:

  • VTree
  • GTree
  • CoordinateRange
  • ...

@josecelano
Copy link
Copy Markdown
Member

Hi @da2ce7, I'm still working on a big refactor on the following branch:

https://github.com/josecelano-bot/torrust-index/tree/review/pr-832-mudlark-isolated

That branch has diverged a lot (you have also made some changes after I forked your PR branch). I don't know if it makes sense to continue working on it unless you think those changes are valuable and could potentially be merged into your branch.

I have mainly been working on grouping free functions around new abstractions. The long-term vision for this megarefactor is described here:

https://github.com/josecelano-bot/torrust-index/blob/review/pr-832-mudlark-isolated/docs/refactoring-plans/oo-method-migration/oo-method-migration.md

I would also like to review how the package handles errors. I see many panics, asserts, debug_ssserts, etc, and in many cases I think it would be better to return a "Result" type. One example, the get function in the arena;

https://github.com/torrust/torrust-index/pull/832/changes#diff-740895d2aae80f3c35d9854eea374f1d02cac113dd0499bda803957605e82243R99-R109

@josecelano
Copy link
Copy Markdown
Member

Hi @da2ce7, I'm still working on a big refactor on the following branch:

https://github.com/josecelano-bot/torrust-index/tree/review/pr-832-mudlark-isolated

That branch has diverged a lot (you have also made some changes after I forked your PR branch). I don't know if it makes sense to continue working on it unless you think those changes are valuable and could potentially be merged into your branch.

I have mainly been working on grouping free functions around new abstractions. The long-term vision for this megarefactor is described here:

https://github.com/josecelano-bot/torrust-index/blob/review/pr-832-mudlark-isolated/docs/refactoring-plans/oo-method-migration/oo-method-migration.md

I would also like to review how the package handles errors. I see many panics, asserts, debug_ssserts, etc, and in many cases I think it would be better to return a "Result" type. One example, the get function in the arena;

https://github.com/torrust/torrust-index/pull/832/changes#diff-740895d2aae80f3c35d9854eea374f1d02cac113dd0499bda803957605e82243R99-R109

I've added a new doc with the main differences between your package and my fork:

https://github.com/josecelano-bot/torrust-index/blob/review/pr-832-mudlark-isolated/docs/mudlark-fork-differences.md

@da2ce7
Copy link
Copy Markdown
Contributor Author

da2ce7 commented Apr 1, 2026

Hello @josecelano

Thank you for your great work in refactoring and reorganizing the Mudlark package. However as far as I can see; this work dose not need to be included in the 1.0.0 release of the package.

There are only a few small points of polish (a small revision to the https://github.com/da2ce7/torrust-index/blob/20260305_mudlark/packages/mudlark/LINKING-EXCEPTION , stating that it should only apply for this version of the API document, and if the document is expanded, the linking exception should be removed...)

and other polish things to do. Otherwise I would like the current form to merged if no other problems are found. :)

@josecelano
Copy link
Copy Markdown
Member

Hello @josecelano

Thank you for your great work in refactoring and reorganizing the Mudlark package. However as far as I can see; this work dose not need to be included in the 1.0.0 release of the package.

There are only a few small points of polish (a small revision to the https://github.com/da2ce7/torrust-index/blob/20260305_mudlark/packages/mudlark/LINKING-EXCEPTION , stating that it should only apply for this version of the API document, and if the document is expanded, the linking exception should be removed...)

and other polish things to do. Otherwise I would like the current form to merged if no other problems are found. :)

OK @da2ce7, most of them are internal changes, the only concern would be about how the package exposes the errors in the public API. I would review panics, asserts, and unwraps, at least the ones that can be triggered but unintentionally bad use by the end users. Maybe you can publish the package with version 0.1.0 while we are still the only end users and move it to 1.0.0 once we have tested it in production in the index/tracker/etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants