Skip to content

Comments

Prefetching (non-slop)#40

Open
markovejnovic wants to merge 53 commits intomainfrom
marko/mes-696-lookup-pre-fetch-post-readdir-2
Open

Prefetching (non-slop)#40
markovejnovic wants to merge 53 commits intomainfrom
marko/mes-696-lookup-pre-fetch-post-readdir-2

Conversation

@markovejnovic
Copy link
Collaborator

No description provided.

@blacksmith-sh

This comment has been minimized.

@mesa-dot-dev
Copy link

mesa-dot-dev bot commented Feb 18, 2026

Mesa Description

This PR introduces a major architectural refactoring that extracts the filesystem and caching logic into a new, generic, reusable library (lib/). The previous bespoke filesystem implementation in src/fs/ has been replaced by this new, asynchronous, and high-performance layer, significantly improving modularity, performance, and testability.

The new git-fs library provides a composable framework for building virtual filesystems. Key components include:

  • AsyncFs: A core asynchronous filesystem cache designed for concurrent access, managing inode lifecycles and enabling directory prefetching.
  • DCache: A highly concurrent, in-memory directory entry cache to accelerate lookups.
  • FutureBackedCache: A new caching primitive to deduplicate expensive, concurrent asynchronous operations.
  • CompositeFs: A component that aggregates multiple filesystems under a single virtual root, creating a unified view of disparate data sources.

The Mesa-specific filesystem logic (src/fs/mescloud/) has been refactored to become a client of this new library. The old MesaFS and OrgFs implementations have been replaced by a new virtual filesystem structure built on CompositeRoot. The legacy icache and other custom FS components have been removed, simplifying the codebase.

Finally, this refactoring is supported by a comprehensive new suite of correctness and concurrency tests for all new filesystem and caching components.

Description generated by Mesa. Update settings

Copy link

@mesa-dot-dev mesa-dot-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performed full review of e2d3738...b4fa0dc

Analysis

  1. Incomplete architecture for new asynchronous filesystem caching - the AsyncFs component lacks critical interfaces for table population and write-through to the directory cache, making these data structures currently unusable.

  2. Background task orchestration framework is added but not connected to the rest of the system, leaving the foundational infrastructure without clear integration paths.

  3. Significant regression in telemetry configuration - removal of Trc::with_telemetry and hard-coding of OTLP exporter means application-level telemetry settings are now ignored.

  4. The PR attempts to introduce multiple significant architectural changes simultaneously (async filesystem caching, background tasks, telemetry rework) without complete implementation paths for each.

Tip

Help

Slash Commands:

  • /review - Request a full code review
  • /review latest - Review only changes since the last review
  • /describe - Generate PR description. This will update the PR body or issue comment depending on your configuration
  • /help - Get help with Mesa commands and configuration options

0 files reviewed | 9 comments | Edit Agent SettingsRead Docs

@markovejnovic markovejnovic force-pushed the marko/mes-696-lookup-pre-fetch-post-readdir-2 branch 3 times, most recently from 176d4d3 to c8b0bd4 Compare February 19, 2026 00:24
@blacksmith-sh

This comment has been minimized.

@markovejnovic markovejnovic force-pushed the marko/mes-696-lookup-pre-fetch-post-readdir-2 branch 2 times, most recently from 0a4da08 to 5e303e4 Compare February 20, 2026 20:13
@markovejnovic markovejnovic force-pushed the marko/mes-696-lookup-pre-fetch-post-readdir-2 branch from 5e303e4 to 745f25a Compare February 20, 2026 20:15
…BridgeInner, ChildInner

Remove the ouroboros dependency entirely. All self-referencing structs
now use Arc<FutureBackedCache> for shared ownership, which is simpler
and enables spawning background tasks that reference the inode table.
After readdir populates a directory via the Claimed CAS path, spawn
background tokio tasks to prefetch each child directory. This makes
subsequent navigation into subdirectories instant since the dcache
and inode table are already populated.

The prefetch uses the same CAS gate (try_claim_populate) so duplicate
work is impossible, and errors are silently ignored since prefetch is
best-effort.
Two concurrency fixes identified during review:

1. CompositeFs::forget could prematurely remove a slot when a concurrent
   backward_or_insert was mid-insert. The remove_if_sync predicate called
   bridge.is_empty() without the coordination mutex, so it could observe
   fwd as empty while backward_or_insert had inserted into bwd but not
   yet fwd. Added ConcurrentBridge::is_empty_locked() that acquires the
   mutex, and use it in the predicate. No deadlock risk — lock ordering
   is always slots bucket lock → bridge mutex.

2. spawn_prefetch_children spawned one tokio::spawn per child directory
   with no bound, creating a thundering herd on the API backend for
   directories with many subdirs. Added a per-AsyncFs semaphore capping
   concurrent prefetch tasks at 8.
- Force ZST assertion evaluation in DropWard::new() (was dead code)
- Implement opendir/readdir/releasedir snapshot pattern in FuserAdapter,
  eliminating offset instability from BTreeMap position indexing and the
  intermediate Vec double-buffer on every readdir call
- Propagate forget from CompositeFs to child inode tables via new
  AsyncFs::evict(), preventing inner inode leaks
- Add DCache::remove_child() and remove_parent() for cache invalidation
- Document path_map race (MES-697) and await_shared O(N) contention
…SS race

Address code review findings:
- insert_sync silently discards duplicates; upsert_sync correctly
  overwrites when a child is re-inserted under a different parent.
- Add comment documenting the intentional populate-flag clobber in evict.
- Avoid nested lock (scc bucket → bridge mutex) in CompositeFs::forget
  by checking bridge emptiness outside the scc predicate (Alice C1)
- Use compare_exchange(DONE→UNCLAIMED) in DCache::evict instead of a
  blind store to avoid clobbering an in-flight populate (Alice C2)
- Document AssertUnwindSafe panic-safety contract on get_or_init and
  get_or_try_init (Bob C1)
- Document why two clones in await_shared are structurally necessary and
  zero-cost for Copy types (Bob C2)
- Handle closed-semaphore case in prefetch spawn with let-else (Bob M6)
…n limitations

- Rewrite evict to read (not remove) the reverse index first, acquire
  the parent's children write lock, then re-check ownership before
  removing — closing the TOCTOU race with concurrent insert.
- Document unbounded memory growth in inode_table/lookup_cache (AsyncFs).
- Expand MES-697 TODO to note the cache-eviction interaction.
- Add two new DCache tests for evict-reinsert and concurrent reparent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant