Skip to content

fix: link-check FP reduction round 2 — short labels, ALL_CAPS, version-adjacent#77

Merged
ohjonathan merged 3 commits intomainfrom
fix/link-check-fp-reduction-round2
Feb 13, 2026
Merged

fix: link-check FP reduction round 2 — short labels, ALL_CAPS, version-adjacent#77
ohjonathan merged 3 commits intomainfrom
fix/link-check-fp-reduction-round2

Conversation

@ohjonathan
Copy link
Owner

@ohjonathan ohjonathan commented Feb 12, 2026

Summary

This PR implements Option A from the second external review response plan: add more pattern exclusions to _looks_like_doc_id() to further reduce link-check false positives. This is the hotfix approach agreed upon in the plan; Option B (require multi-segment snake_case) was explicitly deferred as a more fundamental redesign.

Result: broken references reduced from ~92 → 45 (51% further reduction; ~89% total from the original 408 baseline).

Plan Reference

This PR executes the "Implementation Plan: Further FP Reduction (Option A)" section of the second external review response plan. The plan was developed after the original reviewer re-evaluated the codebase post-PR #76 and verified 7 of 9 items as fixed.

Context from the plan

The remaining ~92 broken references decomposed into 25 unique values across these categories:

Category Examples Unique Count
Track/finding labels A1, A3, B2, X-H1, NB-1, M-2 14
Short alphanumeric (curation levels) L0, L1, L2, L3 4
Snake_case prose tokens logs_dir, warn_legacy, snake_case 3
Version-adjacent v3.2.1b, v2.x 2
ALL_CAPS config AUTO_CONSOLIDATE 1
Other 1-safe 1

This PR addresses the first, second, fourth, and fifth categories. The snake_case prose tokens (e.g. logs_dir, warn_legacy) are intentionally not filtered — they genuinely look like doc IDs and are the hardest to distinguish without the multi-segment redesign (Option B).

Reviewer corrections from the plan (not in scope for this PR, but for context)

The plan also documented two factual corrections to the reviewer's claims:

  1. "Dual CHANGELOGs — no cross-reference" — Incorrect. CHANGELOG.md line 5 already contains a cross-reference to Ontos_CHANGELOG.md, added in commit 6305302.
  2. "Legacy scripts in CI — still runs 152 tests" — Incorrect. pyproject.toml line 58 sets testpaths = ["tests"]. Legacy tests only run when explicitly invoked. The FutureWarning is a Python import-time side effect, not test execution.

Changes

ontos/core/body_refs.py

Three new regex constants and corresponding filter checks in _looks_like_doc_id():

  1. _SHORT_LABEL_RE^[A-Z]{1,2}-?[A-Z]?\d{1,2}$

    • Rejects short alphanumeric labels: A1, B2, X-H1, NB-1, L0, M-2
    • This was the single largest remaining FP category (~18 unique values, ~60+ occurrences)
  2. _ALL_CAPS_RE^[A-Z][A-Z_]+[A-Z]$

    • Rejects SCREAMING_SNAKE_CASE config constants: AUTO_CONSOLIDATE
    • Matches tokens that are entirely uppercase letters + underscores, minimum 3 chars
  3. _VERSION_WILDCARD_RE^v?\d+(\.\d+)*\.x$

    • Rejects version wildcards: v2.x, v3.2.x
  4. Updated _VERSION_RE^v?\d+(\.\d+)+[a-z]?$ (was ^v?\d+(\.\d+)+$)

    • Now also catches trailing pre-release letters: v3.2.1b, 3.2.1a

All four checks are inserted after the file-extension filter and before the if "_" in token or "." in token catch-all, ensuring they intercept tokens that would otherwise be misclassified as doc IDs.

tests/core/test_body_refs.py

24 new test cases added across existing test classes:

TestLooksLikeDocIdFilters (unit tests on _looks_like_doc_id):

  • test_short_labels_rejected — 14 parametrized cases (A1, A3, B2, L0–L3, M-2, B-2, NB-1–NB-3, X-H1, X-H2)
  • test_all_caps_constants_rejected — 3 parametrized cases (AUTO_CONSOLIDATE, MY_CONFIG, SOME_SETTING)
  • test_version_adjacent_rejected — 4 parametrized cases (v3.2.1b, v2.x, v3.2.x, 3.2.1a)

TestFalsePositiveScanning (integration tests via full scan_body_references):

  • test_short_labels_not_in_scan — verifies A1, NB-1 not in scan output
  • test_all_caps_constants_not_in_scan — verifies AUTO_CONSOLIDATE not in scan output
  • test_version_wildcards_not_in_scan — verifies v2.x, v3.2.1b not in scan output

tests/commands/test_link_check.py

2 new integration tests documenting the precision/recall tradeoff:

  • test_link_check_broken_ref_matching_filtered_pattern_not_detected_in_generic_scan — Documents the known gap: a broken bare-token reference whose ID matches a filtered pattern (e.g., A2) is NOT detected by the generic scan. This is the accepted tradeoff for eliminating ~60+ FPs from short labels.
  • test_link_check_short_label_doc_id_detected_when_exists — Confirms the known-ID scan (Pass 1) correctly detects short-label doc IDs when the referenced document exists.

Safety and known tradeoffs

All new filters only affect Pass 2 (generic unknown-ID scan, _iter_generic_id_candidates). The known-ID scan (Pass 1, _iter_known_id_candidates) always finds existing doc IDs by exact match regardless of naming pattern, so references to existing documents are never missed.

However, broken references to non-existent documents whose IDs match filtered patterns (e.g., a typo A2 when A1 exists) will not be detected by the generic scan. This is an intentional precision/recall tradeoff — the same class of gap exists for all pre-existing filters (_VERSION_RE, _BARE_NUMBER_RE, _FILE_EXTENSION_RE, _KNOWN_FIELD_NAMES) and is inherent to heuristic-based generic scanning. The gap is narrow: it only affects documents with IDs matching short labels / ALL_CAPS / version wildcards that (a) don't exist and (b) are referenced only as bare tokens in body text (not in frontmatter depends_on, which is always validated against the full ID set).

The existing TestKnownIdsBypassFilters test class validates the Pass 1 safety invariant, and the new test_link_check_broken_ref_matching_filtered_pattern_not_detected_in_generic_scan test explicitly documents the accepted tradeoff.

Verification

$ python3 -m pytest tests/core/test_body_refs.py -v    # 88 passed
$ python3 -m pytest tests/commands/test_link_check.py -v  # 14 passed
$ python3 -m pytest tests/ --tb=short                  # 918 passed, 2 skipped
$ ontos link-check --json | python3 -c "..."           # broken_references: 45

Test plan

  • All 88 body_refs tests pass (including 24 new)
  • All 14 link-check integration tests pass (including 2 new)
  • Full suite: 918 passed, 2 skipped, 0 failures
  • Broken reference count verified: 92 → 45
  • Known tradeoff documented with explicit test case
  • Review board: verify filter patterns don't reject legitimate doc ID naming conventions
  • Review board: verify two-pass safety claim by checking TestKnownIdsBypassFilters

🤖 Generated with Claude Code

…n-adjacent

Reduce link-check false positives from ~92 → 45 by adding three new
pattern exclusions to _looks_like_doc_id() in the generic body scan:

1. Short label pattern (^[A-Z]{1,2}-?[A-Z]?\d{1,2}$)
   Rejects track/finding labels: A1, B2, X-H1, NB-1, L0, M-2, etc.
   ~18 unique values, the largest remaining FP category.

2. ALL_CAPS token pattern (^[A-Z][A-Z_]+[A-Z]$)
   Rejects SCREAMING_SNAKE_CASE config constants: AUTO_CONSOLIDATE.

3. Version-adjacent patterns:
   - Extended _VERSION_RE to catch trailing letters (v3.2.1b)
   - Added _VERSION_WILDCARD_RE for .x wildcards (v2.x, v3.2.x)

Safety: All filters only affect Pass 2 (generic unknown-ID scan).
Pass 1 (known-ID scan) always finds existing doc IDs regardless of
naming pattern, so no false negatives are introduced.

24 new tests added (14 short label, 3 ALL_CAPS, 4 version-adjacent,
3 integration scan tests). Full suite: 916 passed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ohjonathan
Copy link
Owner Author

Adversarial review (single comprehensive pass)

Findings

  1. High: PR fix: link-check FP reduction round 2 — short labels, ALL_CAPS, version-adjacent #77 expands a real false-negative blind spot for broken bare-token references.

Evidence:

  • Link-check runs a two-pass body scan: known IDs exact-match pass (ontos/core/link_diagnostics.py:327) + generic unknown pass (ontos/core/link_diagnostics.py:336).
  • PR fix: link-check FP reduction round 2 — short labels, ALL_CAPS, version-adjacent #77 adds additional generic rejections in _looks_like_doc_id() (ontos/core/body_refs.py:676, ontos/core/body_refs.py:680, ontos/core/body_refs.py:683).
  • This means missing refs that fit those shapes can be silently dropped in generic mode.

Concrete repro I ran:

  • Temp repo with one doc: id: A1
  • Body content: Roadmap references A2.
  • ontos link-check --json result: broken_references: 0, exit_code: 0

Impact:

  • Broken bare references like A2, AUTO_CONSOLIDATE, v2.x can be missed if they don’t already exist in the graph.

Nuance:

  • The PR claim that “known IDs are safe” is true for existing IDs (pass 1).
  • The gap is unknown/missing IDs (the actual broken-reference class), which rely on pass 2 heuristics.
  1. Medium: Test coverage does not protect the above failure mode.

Evidence:

  • Existing tests validate generic filtering (tests/core/test_body_refs.py:278) and known-id bypass (tests/core/test_body_refs.py:264).
  • There is no command-level test showing link-check still reports broken bare refs for the newly filtered token classes.

Impact:

  • Regression can pass all current tests while weakening broken-reference detection recall.

Open questions

  1. Is the intended product behavior now “these token classes should never be treated as bare ID candidates,” even if true broken refs are missed?
  2. If yes, should this be an explicit mode/config tradeoff (precision vs recall) rather than a silent heuristic tightening?

Validation run

  1. pytest -q tests/core/test_body_refs.py -> 88 passed
  2. pytest -q tests/commands/test_link_check.py tests/commands/test_rename.py -> 37 passed
  3. pytest -q -> 916 passed, 2 skipped
  4. ontos link-check --json -> broken_references: 45 (matches PR claim)

Overall: this is a strong incremental FP reduction, but I would treat Finding #1 as blocking unless reduced recall for broken bare-token references is an explicit, accepted tradeoff.

ohjonathan and others added 2 commits February 12, 2026 19:23
Two new tests for PR #77 adversarial review response:
- Known gap: broken ref matching filtered pattern not detected in generic scan
- Happy path: short-label doc ID detected by known-ID scan when it exists

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ohjonathan ohjonathan merged commit 81481b1 into main Feb 13, 2026
@ohjonathan ohjonathan deleted the fix/link-check-fp-reduction-round2 branch February 13, 2026 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant