Introduce Mudlark — an adaptive dual-tree spatial index#832
Introduce Mudlark — an adaptive dual-tree spatial index#832da2ce7 wants to merge 4 commits intotorrust:developfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #832 +/- ##
============================================
+ Coverage 52.20% 67.64% +15.44%
============================================
Files 116 153 +37
Lines 6166 12148 +5982
Branches 6166 12148 +5982
============================================
+ Hits 3219 8218 +4999
- Misses 2866 3681 +815
- Partials 81 249 +168 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@da2ce7 why |
…on testing results - MUDLARK_EXPLAINED.md: accessible 14-section explainer of the dual-tree design - GLOSSARY.md: authoritative term definitions with naming rationale - USE_CASES.md: 5 concrete operational problems with fit assessments - COORDINATE_ENGINEERING.md: 6 encoding techniques for non-integer domains - DESIGN_ALTERNATIVES.md: 8 alternative data structures compared to mudlark - VERIFICATION_STRATEGY.md: black-box verification plan for AI-assisted PR - REVIEW_PR_832.md: phased review notes (phases 1–4 complete) - mutants-results/: cargo-mutants run over torrust-mudlark - 1631 mutants tested, 621 caught, 413 missed, kill rate 60.1% - missed.txt contains all 413 uncaught mutations for author to review - reference_comparator.rs: naive O(N) reference impl to compare against mudlark
Review notes — @josecelanoHi @da2ce7, I've been reviewing this PR. Given the size (~64k lines, AI-assisted), I took a different approach from a traditional line-by-line review. Here's what I've done and what I found. What I've producedAll review documents are in a branch on my fork:
Known issues from the code review (Phases 1–4)These are concrete, actionable items regardless of correctness of the implementation:
Mutation testingI ran
A 60% kill rate means ~40% of deliberate code breaks went undetected by the 833 tests. This is below what I'd expect for a library with formal invariant documentation. The full list of 413 missed mutations is in:
Notable patterns in the missed list:
This doesn't necessarily mean the implementation is wrong, but it does mean the tests are not verifying the numeric outputs of these functions precisely enough. I'd recommend reviewing the missed mutations, particularly in the non-test modules, and adding targeted assertions. Next steps (my side)I'm working on a naive O(N) reference implementation ( I'll update this comment when that's complete. The design intent of mudlark is clear and well-documented. The dual-tree architecture is well-motivated, the public API is well-structured, and |
7e087d5 to
a39965d
Compare
|
Hello @josecelano , Thank You for your time in reviewing this pull request. I did address the Non-Mudlark issues in: Now merged, and this pull request has been rebased. Now I will first focus on the non-substantive issues, i.e. test counts being out of sync, and the api.md not being complete. Then after I will spend some time on test coverage that the mutation-probing has shown to be insufficient. Kind regards, Cam. |
a39965d to
bada211
Compare
Hi @da2ce7 , I haven't finished the review. I will continue next week. It's being very interesting (I've learn many new things) and I still have many questions and I want to get a deeper understanding. |
|
Hello @josecelano upon analysis of your comment:
I did a propper audit of the Public Facing API for v1.0, the existing ADR-032 couldn't answer the questions that actually mattered. The problem was concrete. I needed to decide: should Each of those questions had a defensible answer, but each one felt ad hoc. The taxonomy named the surfaces; it didn't say how to assign things to them. So instead of making five judgment calls, I went back to ADR-032 and asked what the assignment criteria actually are. Three tests fell out:
Once the tests exist, the API decisions become mechanical:
The analogy section got similar scrutiny: every mapping now carries an explicit fidelity tag (load-bearing / illustrative / decorative / breaks) so future decisions can lean on the parts of the metaphor that do real design work and leave the rest alone. Separately, and found during this pass: the decay factor table was sized assuming tree depth ≤ N, which doesn't hold for Now the API document is correct for v1 (at least I hope so!), and the choices are both much more defensible and maintainable. |
2dc4aa8 to
5d8e052
Compare
|
Hello @josecelano , I took the time to overhaul the test suite after your mutation testing found the package lacking — the results were genuinely useful. The surviving mutants clustered into clear categories (value-correctness, boundary-precision, type-coverage, and invariant checker self-consistency), so I wrote up the full analysis as ADR-M-039 and then closed the gaps. Test count: 868 → 1 311 (+443 tests across all three surfaces). Two new tests are worth reading end-to-end — they're designed as teaching material, not just assertions:
Both use The README has also been rewritten to take advantage of these tests. The Quick Start and Advanced Usage sections are now runnable doc-tests (compiled via I'll be interested to see how the mutation testing results change with this revision. |
5d8e052 to
c0a8d85
Compare
c0a8d85 to
0aad12a
Compare
Introduce the new sub-package called "mudlark", named after the Magpie-Lark. Path: "packages/mudlark"
BREAKING CHANGES: - Replace `GNodeInfo<C, V>` with `Node<C, V>` — `gnode_info()` now returns the same Surface 1 view type used by `layers()`. A new `#[doc(hidden)] GNodeChildren` covers the child-linkage use case. - Demote `VNodeId` to `pub(crate)` — no public consuming method exists (ADR-M-032 handle test). - Demote `v_root()` to `pub(crate)`. - Mark `GNodeId::from_index()` and `GNodeId::index()` as `#[doc(hidden)]`. - Split `Observation::scale()` (panicking default) into a separate `ScalableObservation<V>` sub-trait. Cross-type impls (f64→uint, f32→uint, f64→f32) implement both; same-type impls no longer carry an unusable `scale()`. - Relax `select_plateaus` bound from `V: Proratable + Inspectable` to `V: Inspectable`. - Demote `serde` from a default feature to opt-in. - `Cell` now represents the uncovered interval (terminal or vacated semi-internal half), not always the full G-node range. `sample()` returns the narrowed interval for semi-internals. Bug fixes: - Fix decay factor-table index-out-of-bounds panic for f64 coordinates where G-tree depth can exceed N (ADR-M-038). The factor table is now sized from the actual max depth within the subtree, not from `N - d_root`. - Allow `attenuation = 0.0` in `decay()` (annihilation, §THEORY M-7.3) and handle `attenuation = ∞` (infinite amplification) without NaN from `ln(0) * 0` or `ln(∞) * 0`. - Guard `Attenuatable::attenuate()` for f32/f64 against IEEE 754 `0 * ∞ = NaN` — if either operand is zero, return zero. - Fix invariant checkers and debug assertions to short-circuit `∞ == ∞` before computing `∞ - ∞ = NaN`. Improvements: - Add `Node::parent` field and `Node::is_root()` method. - Add `debug_assert!` P2 guard in `observe()` for negative accumulations (ADR-M-033). - Move `select_plateaus` from graph_query.rs to graph_plateau.rs. - Relax `Pewei` method bounds: `layer_count()`, `node_count()`, `reconstruct_all()` require only `Accumulator`, not `Proratable`. - Correct sampling cost from O(1.44 H + 1.67) to O(1.44 H). - Mark `build_plateaus()` and `build_plateau_basis()` as `#[doc(hidden)]`. Documentation: - Rewrite ADR-M-032 with testable surface-assignment criteria and expanded analogy tables. - Add ADR-M-038 (decay factor-table depth bound). - Significantly expand docs/api.md with Cell/Node/contour-range semantics, cross-references, and worked examples. - Update idea.md, architecture.md, performance.md, testing.md. Tests: - Add `tests/decay_infinite.rs` — infinite amplification, ADR-M-038 regression, and annihilation edge cases (480 lines). - Add `tests/negative_f64.rs` — P2 violation guard and downstream semantic breakage tests (290 lines). - Update test count: 833 → 868 (346 unit, 415 integration, 107 doc-tests). - Update CHANGELOG release date to 2026-03-20.
…examples
A cargo-mutants run produced ~305 surviving mutants, revealing
systematic blind spots in value-correctness, boundary-precision,
and type-coverage testing. ADR-M-039 documents the analysis and
classifies survivors into seven mutant classes; this commit closes
the gaps it identified.
Test count: 868 → 1 311 (+443 tests across all three surfaces).
New crate-level test modules:
- config_validation: rejects invalid Config combinations
- decompose_basis: value-exact basis-element assertions
- invariant_self: checker self-consistency (kills class-1 mutants)
New integration tests:
- pedagogy: 10-step narrative walkthrough of the worked example
from §IDEA M-16 — bootstrap split, catalytic split, skip-
promote, uncle shield, PEWEI extraction, proportional sampling,
subtree decay, and post-decay rebalancing. Every assertion
maps to a spec claim; every step prints a teaching narrative.
- pedagogy_advanced: 18-step companion covering the read surface
— construction alternatives (from_observations, Extend, Clone),
gnode_info / is_ancestor_of introspection, range_sum pro-ration
subtleties, full contour-range query pipeline (endpoint lattice,
basis consolidation, energy-field algebraic identity,
energy-only variant, invalid-endpoint rejection), plateau
selection with outward snap, progressive PEWEI reconstruction
with truncation, and budget-limited eviction under a tight
budget=10 ceiling.
- eviction_diagnostic: step-by-step diagnostic replay (replaces
eviction_debug)
- eviction_plateau: plateau integrity through eviction cycles
(replaces eviction_p_i2)
Both pedagogy tests use `harness = false` (custom main) so the
steps execute in sequence on a shared graph, printing a continuous
narrative that can be read alongside the formal spec.
Removed (superseded):
- eviction_debug → eviction_diagnostic
- eviction_p_i2 → eviction_plateau
Every existing test file receives a doc-header index table per the
new AGENTS.md "Test Doc-Headers" convention, plus targeted new tests
to kill mutation-testing survivors.
README rewrite:
- New "Choose GvGraph" section (when to use, when not to)
- Quick Start rebuilt around a minimal N=3 graph with five
canonical observations; each section is a runnable doc-test
(compiled via `include_str!` in lib.rs)
- New sections: adaptive splitting with ASCII contour diagrams,
entropy-adaptive sampling, targeted subtree decay, advanced
usage (checkpoint/fork, gnode_info, contour-range decomposition,
progressive PEWEI reconstruction, budget-constrained operation)
- "Limitations" section (1D only, f64 cost, single-writer, no
signed accumulators)
- Performance table softened to "typical latencies" with caveat
that absolute numbers vary by hardware
- Cross-reference conventions reformatted as a tag→document table
- License section updated for the new LINKING-EXCEPTION
LINKING-EXCEPTION:
AGPL-3.0-only linking exception (v1.0) permitting unmodified use
through Surfaces 1 and 2 without copyleft obligations on the
consumer's code. Surface 3 (pub(crate) / #[doc(hidden)]) remains
fully AGPL. Explicitly notes the Legacy Exception does not apply
to this package.
Other changes:
- Cell / api.md docs expanded: sample() can land on internal
G-nodes whose V-entry carries frozen pre-split intensity
(ADR-M-019, §IDEA M-6.5)
- GNode::Internal doc-comment expanded with the same detail
- Config<V> derives PartialEq (needed by config_validation and
the pedagogy test's README_CONFIG == worked_example_config guard)
- Testing presets: no_split_config, f64_no_split_config,
decay_config, contour_range_config, plan_multi_plateau
- Testing runners: run_diagnostic, run_diagnostic_budgeted
(step-by-step invariant-checking replay with full diagnostic
dump on first violation)
- ADR tables reformatted for consistent column alignment
- AGENTS.md: "Test Doc-Headers" convention added, table and
prose refinements
- CHANGELOG date corrected to 2026-03-21; test counts updated
- rand 0.10 added to dev-dependencies
- updated `project-words.txt`
0aad12a to
5227ea5
Compare
…on testing results - MUDLARK_EXPLAINED.md: accessible 14-section explainer of the dual-tree design - GLOSSARY.md: authoritative term definitions with naming rationale - USE_CASES.md: 5 concrete operational problems with fit assessments - COORDINATE_ENGINEERING.md: 6 encoding techniques for non-integer domains - DESIGN_ALTERNATIVES.md: 8 alternative data structures compared to mudlark - VERIFICATION_STRATEGY.md: black-box verification plan for AI-assisted PR - REVIEW_PR_832.md: phased review notes (phases 1–4 complete) - mutants-results/: cargo-mutants run over torrust-mudlark - 1631 mutants tested, 621 caught, 413 missed, kill rate 60.1% - missed.txt contains all 413 uncaught mutations for author to review - reference_comparator.rs: naive O(N) reference impl to compare against mudlark
Component Architecture (module dependency layers)Execution / Call Flowcc @da2ce7 NOTE: In the Brave browser, the diagram lines are not displayed when you click the SVG file. Working fine on Chrome and Firefox. |
Finding 10 — Arena Stale-Handle ABA Problem@da2ce7 While reviewing The Concrete scenario:
This is the classic ABA problem. No memory-safety issue (Rust prevents that), but it can silently corrupt a caller's view of the tree. The core question is: is caching a
Full write-up with fix options and a comparison table (custom arena vs
|
torrust#832) - Add packages/mudlark/tests/proptest_harness.rs: 7 property-based tests (P1 total_sum conservation, P2 range_sum near-additivity, P3 range_sum monotonicity, P4 structural invariants, P5 plateau coverage, P6 annihilation, P7 budget enforcement). All 7 pass. - Add proptest = "1" dev-dependency and [[test]] entry to mudlark Cargo.toml. - Add docs/pr-reviews/832/PROPTEST_STRATEGY.md explaining the oracle problem, reference model, properties checked, and development findings. - Update docs/pr-reviews/832/VERIFICATION_STRATEGY.md: add Step 6 row. - Add docs/pr-reviews/832/proptest-results/run-2026-03-25.txt and regression seeds: two proptest counterexamples discovered and resolved during harness development (F-P2 range_sum approximation, F-P7 budget headroom).
Finding #11 —
|
…rust#11 (torrust#832) - Fill in REVIEW_PR_832.md Phase 5 with findings from isolated branch analysis (review/pr-832-mudlark-isolated): - Reviewer-written integration tests (10) and insta snapshot tests (4) - Coverage results: 86.96% lines / 96.67% fn (436 tests) - Finding torrust#11: post-evict debug_assert fires in debug builds, blocks all eviction-gated test paths (root cause of ~87% coverage ceiling) - Cyclomatic complexity analysis via rust-code-analysis-cli - Mutation kill rate: 60.1% (621/1034), re-run needed after torrust#11 fixed - Module structure observation (flat vs grouped layout) - Add findings/finding-11-post-evict-assert.md - Update INDEX.md: Phase 5 done, Finding torrust#11 in table, expanded Open Items, updated Measurement Snapshots
… approximation is torrust#11 (torrust#832) - Finding torrust#11 = range_sum approximation not clearly documented (proptest P2, reported in PR comment torrust#832 (comment)) - Finding torrust#12 = post-evict debug_assert blocks eviction tests (was torrust#11) - Rename finding-11-post-evict-assert.md -> finding-12-post-evict-assert.md - Create finding-11-range-sum-approximation.md - Update INDEX.md: findings table, open items, measurement snapshots reference - Update REVIEW_PR_832.md Phase 5: all torrust#11 post-evict references -> torrust#12
Mutation Testing Re-run — 2026-03-26@da2ce7 — here are the results of the re-run of Result
The new test suite brings the kill rate well above the 80% threshold. Well done — that's a significant improvement. Remaining 72 missed mutantsThe surviving mutants are concentrated in five areas. I've grouped them below.
|
…t#832) - Re-run 2026-03-26 with Cameron's 1,311 tests (was 868) - 1004 caught / 72 missed / 1 timeout / 627 unviable / 1077 viable - Kill rate: 60.1% → 93.2% (well above 80% target) - Update Phase 5 verdict, author responses, measurement snapshots, open items - Also include the mutants-results/mutants.out output from the run - PR comment posted: torrust#832 (comment)
Isolated-Branch Experiments — Review SupplementHi @da2ce7 — alongside the main review I've been running a parallel set of Branch: The idea: extract only the Rust source (no docs, no comments, no tests) into a Experiments at a glance
Things that might interest you
|
The G-node arena reuses slots via a free list. A handle obtained before an eviction could silently bind to a new, unrelated node after the slot is reallocated (ABA problem). This is not hypothetical — the sentinel stores handles across observation cycles where evict-then-split sequences can recycle slots. Introduce a two-tier handle design: - GNodeId (pub, 8 bytes): slot index + generation counter. Used on all public API surfaces. Consuming methods (decay, gnode_info, is_ancestor_of) validate the generation and panic on mismatch. - GSlotPointer (pub(crate), 4 bytes): generation-free handle for internal tree walks where no interleaving eviction can occur. Renamed from the former GNodeId. - VSlotPointer (pub(crate), 4 bytes): renamed from VNodeId for consistency with the new naming convention. Arena changes: - alloc() returns (index, generation) tuple - dealloc() increments the per-slot generation counter - New generation() accessor for handle validation The generation Vec is cold data — touched only on alloc/dealloc, never on hot read/write paths. Internal hot paths continue to use 4-byte GSlotPointer; the 8-byte GNodeId exists only at the API boundary.
d773dd6 to
b358f6a
Compare
|
Hi @da2ce7, I'm still working on the review. I'm refactoring the package to see how far I can get in reducing accidental complexity. I'm working in this orphan branch only with the package. https://github.com/josecelano-bot/torrust-index/tree/review/pr-832-mudlark-isolated I think there are already a couple og good abstractions like:
|
|
Hi @da2ce7, I'm still working on a big refactor on the following branch: https://github.com/josecelano-bot/torrust-index/tree/review/pr-832-mudlark-isolated That branch has diverged a lot (you have also made some changes after I forked your PR branch). I don't know if it makes sense to continue working on it unless you think those changes are valuable and could potentially be merged into your branch. I have mainly been working on grouping free functions around new abstractions. The long-term vision for this megarefactor is described here: I would also like to review how the package handles errors. I see many panics, asserts, debug_ssserts, etc, and in many cases I think it would be better to return a "Result" type. One example, the |
I've added a new doc with the main differences between your package and my fork: |
|
Hello @josecelano Thank you for your great work in refactoring and reorganizing the Mudlark package. However as far as I can see; this work dose not need to be included in the 1.0.0 release of the package. There are only a few small points of polish (a small revision to the https://github.com/da2ce7/torrust-index/blob/20260305_mudlark/packages/mudlark/LINKING-EXCEPTION , stating that it should only apply for this version of the API document, and if the document is expanded, the linking exception should be removed...) and other polish things to do. Otherwise I would like the current form to merged if no other problems are found. :) |
OK @da2ce7, most of them are internal changes, the only concern would be about how the package exposes the errors in the public API. I would review panics, asserts, and unwraps, at least the ones that can be triggered but unintentionally bad use by the end users. Maybe you can publish the package with version 0.1.0 while we are still the only end users and move it to 1.0.0 once we have tested it in production in the index/tracker/etc. |
The Magpie-lark is a small Australian bird. Common, unassuming, often spotted fossicking in the mud — often affectionately called a "mudlark". What makes it distinctive is that mated pairs sing together in precisely coordinated duets, each bird contributing its part to a shared song.
That's the metaphor behind this crate's name. Mudlark is built around two trees that work in concert: one handles spatial geometry, the other handles significance ranking. They each have their own form and uses, and when they co-operate, they duet.
Summary
This PR adds mudlark to the workspace: a library crate shipping a single core type,
GvGraph, that acts as an adaptive spatial index over a bounded domain. You feed it observations — a coordinate and a value — and it figures out where to pay attention. Regions that receive lots of activity earn finer resolution. Quiet regions stay coarse. As the distribution shifts, the structure re-evaluates, promoting what's become important and shedding what's gone cold.The simplest way to think about it: a histogram that decides its own bin widths, continuously, based on what it's seen.
The spec calls it a φ-Bounded Geometric-Value Graph — φ (the golden ratio) because the depth bound follows a Fibonacci recurrence, bounded because the node budget is hard-capped, geometric for the spatial partition tree, value for the significance tournament.
GvGraphin the code.Most adaptive structures make one tree do everything — spatial partitioning, importance ranking, rebalancing — all tangled together. Mudlark separates those concerns cleanly. The G-Tree (Geometric) is a binary partition of the coordinate space. It knows about intervals, accumulated values, and routing. The V-Tree (Value) is a competitive tournament that ranks entries by significance. The two share the same underlying nodes but see them from completely different angles — one spatial, one evaluative. Neither alone is sufficient. Together they produce an index whose resolution tracks the data's actual structure, with formal efficiency guarantees rooted in information theory.
The everyday surface is straightforward: observe into it, query it (point lookups, range sums, O(1) contour projection), sample from it proportionally, extract significance-ordered snapshots, and apply temporal decay so it tracks the present rather than accumulating the past forever. The sampling cost adapts to the distribution — hotspots are cheap, uniform distributions cost more — and the bound is provably within 1.44× of the information-theoretic optimum.
This data structure is general and should find use in future developments.
Contents
The addition is substantial (~79k lines) but almost entirely self-contained within mudlark.
pub(crate)for internals, flat re-exports for the public API) and#![forbid(unsafe_code)]idea.md) developing the structure from first principles in coding theory — the implementation references it throughout via§IDEA NannotationsChanges outside mudlark
Reviewing this
The public API is defined in
docs/api.mdand surfaced through the flat re-exports in lib.rs — that's the best starting point. The formal spec (docs/idea.md) is reference material; you don't need to read it to review the code, but the source cross-references it via§IDEA Nannotations if you want to trace a design choice back to its rationale.For a focused pass: lib.rs (crate root and re-exports) →
src/graph.rs(the main type) →src/traits/(the public trait surface) → tests (integration tests exercise only the public API).The two pedagogy tests (
tests/pedagogy.rsandtests/pedagogy_advanced.rs) are designed to be read end-to-end — they walk through the worked example from the spec, printing a teaching narrative at each step. Runningcargo test -p torrust-mudlark --test pedagogy -- --nocaptureproduces a readable tutorial.Notes
ConfigandGvGraphhave noDefaultimpl because the parameters are genuinely domain-dependent. The API surface is intentionally narrow and documented indocs/api.md.unsafecode. Forbidden at the crate root.