Lazy trees v2 #13225

edolstra · 2025-05-17T21:33:54Z

Motivation

This adds a setting lazy-trees that causes flake inputs to be "mounted" as virtual filesystems on top of /nix/store as random "virtual" store paths. Only when the store path is actually used as a dependency of a store derivation do we materialize ("devirtualize") the input by copying it to its content-addressed location in the store.

String contexts determine when devirtualization happens. One wrinkle is that there are cases where we had store paths without proper contexts, in particular when the user does toString <path> (where <path> is a source tree in the Nix store) and passes the result to a derivation. This usage was always broken, since it can result in derivations that lack correct references. But to ensure that we don't change evaluation results, we introduce a new type of context that results in devirtualization but not in store references. We also now print a warning about this.

Supersedes/incorporates #12432 and #12915. It replaces #6530. The big difference with #6530 is that the latter treated each source tree as its own root filesystem, which caused a lot of backward compatibility issues that required ugly hacks. With the new approach, lazy source trees appear under /nix/store, just using a virtual store path that doesn't exist in the real filesystem.

TODO: run the test suite with lazy-trees = true.

Context

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

nixos-discourse · 2025-05-18T13:10:08Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/determinate-nix-3-5-introducing-lazy-trees/64350/7

This adds a setting 'lazy-trees' that causes flake inputs to be "mounted" as virtual filesystems on top of /nix/store as random "virtual" store paths. Only when the store path is actually used as a dependency of a store derivation do we materialize ("devirtualize") the input by copying it to its content-addressed location in the store. String contexts determine when devirtualization happens. One wrinkle is that there are cases where we had store paths without proper contexts, in particular when the user does `toString <path>` (where <path> is a source tree in the Nix store) and passes the result to a derivation. This usage was always broken, since it can result in derivations that lack correct references. But to ensure that we don't change evaluation results, we introduce a new type of context that results in devirtualization but not in store references. We also now print a warning about this.

Mic92 · 2025-05-20T07:12:06Z

Running the test suite uncovered a couple of failures: https://github.com/NixOS/nix/actions/runs/15130602489/job/42530714771?pr=13235

Mic92 · 2025-05-20T07:13:06Z

Particularly nixpkgsLibTests

roberth

Using randomness or even fingerprints for placeholders makes the language non-deterministic and impure.

Some things we can do, in approximate order of increasing cost/benefit:

copy to the store on demand (and concurrently)
have lazy path values
have rope-structured strings with lazily computed fixed-length substrings to represent lazy hashing
extend the language with weaker interfaces that perform better, e.g. as described in #10689 (comment)

With those solutions in mind, I don't think we have to ruin the language semantics with any kind of "entropy", including fingerprints.
Path equality and ordering (as observable to users) must remain identical, and we have plenty of options for making it lazier without sacrificing determinism or compatibility.

Mic92 · 2025-05-23T08:53:20Z

@roberth I understand your concerns but I think it would be hard to make improvments if changes are not allowed incrementally. I think having large pull requests lingering also makes impacts negatively other pull requests where we are too afraid to apply large scale refactorings. Would you be fine if the settings would be replaced with a compile-time flag?

Aleksanaa · 2025-05-23T08:57:23Z

How about guard behind an experimental flag and let more people use it first?

Mic92 · 2025-05-23T09:05:00Z

How about guard behind an experimental flag and let more people use it first?

It's already a flag not enabled by default, so that should be covered. Commonly we use experimental flags for new features that might change over time. This flag should not introduce any new features so, but potentially bugs, so I don't think it needs to be experimental in this case. The documentation of the flag should potentially mention the current randomness however.

roberth · 2025-05-23T09:27:50Z

@Mic92 Incremental improvements are allowed, but this is not an improvement.

If any contributor, including Nix team members, wants to make breaking changes to the language, they should discuss that with the team first, and it is unlikely to result in an actual change to the language, as Nix promises reproducible behavior on old expressions.

Large PRs are ok if they're refactors, especially if they're previously agreed upon. However, this PR is not a refactor. It changes the semantics of the Nix language significantly, and for worse.

compile-time flag

What would be the purpose, except enabling a bug? Any kind of flag, compile time or otherwise, has a real cost in terms of maintenance. Also we shouldn't be seducing users into bad workarounds. We've seen how that ends - or, I guess, doesn't end.

I appreciate that you try to find a middle ground @Mic92, but this is not a negotiation. We don't balance out insufficient work with personal authority or upvotes. Good work gets merged.

Mic92 · 2025-05-23T09:39:16Z

src/libexpr/paths.cc

+    fetchers::Input & input, const fetchers::Input & originalInput, ref<SourceAccessor> accessor, bool requireLockable)
+{
+    auto storePath = settings.lazyTrees
+                         ? StorePath::random(input.getName())


For context this is the line the concern is about.

Path equality and ordering (as observable to users) must remain identical, and we have plenty of options for making it lazier without sacrificing determinism or compatibility.

Concretely, an easy incremental solution would be to just hash the source without copying, and eat the cost of that. It may still avoid the more expensive copying operation.
By having the correct string in the first place, we're not exposing the wrong string in the first place.

More optimizations can be applied later, with the usual condition that they don't regress the language semantics.
(e.g. the list I commented in my review, or something else)

For debugging, behind a flag, it might be interesting to put an X instead of the last Nix32 hash character, indicating that some code path didn't realise its string context as it should have, but other than that, I see no use for string replacements because of the whole language semantics thing. We can't have an observable wrong string, so there won't be anything to replace beyond what we're doing for ca-derivations.

Would it be beneficial to use a "dummy" hash like all a's here, instead of something random? The names are all coming from the flake inputs, right? So we would end up with them being unique anyway, unless I am misunderstanding this section, which is well possible. We would simply have to keep the dummy hash stable to maintain compatibility, then.

Concretely, an easy incremental solution would be to just hash the source without copying, and eat the cost of that. It may still avoid the more expensive copying operation. By having the correct string in the first place, we're not exposing the wrong string in the first place.

Seems still quite slow to the 2s I get with the current implementation:

% time find $HOME/git/nixpkgs -type f | xargs cat >/dev/null find $HOME/git/nixpkgs -type f 0.18s user 1.36s system 8% cpu 18.286 total xargs cat > /dev/null 0.37s user 8.25s system 43% cpu 19.634 total

And nixpkgs is not even the biggest repository. For people doing data-science or game development, asset sizes can be way beyond what we have in nixpkgs (i.e. 40G for datasets/game assets).

Hey @nh2, thank you for the stat stats!

The hash cache you propose

hash(abspath) -> ( inode, mtime, hash(contents) )

seems like a good fit for making sure the stat cache is virtually always accurate.

I'd like to stress though that since normal, non-git-hashing store path hashes are not Merkle hashes, we don't get to use those hashes as part of the store path.
Nonetheless, they're useful!
Step 1. Determine more reliably whether or not to NAR-hash the entire thing. This only pays off if no files have changed, and slows the other cases a bit due to the extra bookkeeping.
Step 2. Produce an internal Merkle hash (e.g. using git-hashing), which can serve as an alternate cache key for the evaluation cache, so that we can avoid the NAR hash in cases where we don't really need it for proper evaluation semantics, e.g. nixpkgs#legacyPackages, but not (for now) NixOS, which needs paths for module deduplication. In that case a copy will still be triggered by toString m.file or some similar code. This requires that flakes are loaded from lazy path values, not store path strings as in this PR.

Also we may need this and/or something like this to avoid unnecessarily repeating Git clean filter calls, but we aren't doing yet (#13464).

adding stat() will not make lazy trees slower,

This was already true from a design / requirements perspective. That's assuming the current evaluation caching approach, which requires that we know what's in the entire Git workdir, similar to git status, which also needs to perform all the stat calls.
A failure to call stat more than exactly once per file should be considered an implementation detail.
It's good to know the cost of adding extra stat calls, in case we need to temporarily implement it with extra calls. Doing it with fewer than 1 stat per Git workdir file is unfortunately not an option. (unless we want to get some inotify daemon involved, but I don't think we want those extra moving parts?)

The proper solution is to see how we can either make libgit2's git status equivalent faster and/or more useful, or if the complexity of a persistent cache is an issue for libgit2 upstream, we could look to replace it in our libfetchers by using lower level libgit2 operations in ways we see fit.

I'd like to stress though that since normal, non-git-hashing store path hashes are not Merkle hashes, we don't get to use those hashes as part of the store path.

Could StorePath::random(input.getName()) not just be replaced with a merkle-based path? This would effectively resolve the issue of non-determinism while still avoiding the need to compute the nar-hash of the path on each evaluation

(dislaimer: I'm a noob and might not know what I'm talking about)

@WeetHet merkle hash doesn't change if an empty directory is added (or removed), while nix evaluation result can absolutely change. So, if we were to use it directly it would easy create cache key collision, and eval result would likely be invalid.

However, iterating on this idea: to prevent the empty dir problem, perhaps construct a sidecar structure specifically locating all empty dirs in a tree, and use it in final hash like this: hash(merkle_hash(path) + empty_dirs_hash(path))?

Yeah, I'm sure there are differences but they can be worked out eventually and working on adjusting merkle hash behaviour seems realistic to me (you can even use random fs paths and use a data structure mapping from the path to the required information so it can be updated dynamically)

Given that no one seemingly even uses path comparisons, couldn't we just de-virtualise both paths if they are compared?

nixos-discourse · 2025-06-04T22:03:26Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2025-05-21-nix-team-meeting-minutes-228/65204/1

nixos-discourse · 2025-06-25T08:08:16Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/changelog-lazy-trees-and-security-improvements/66033/8

nixos-discourse · 2025-07-14T18:34:11Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/determinate-nix-3-8-0-changelog-a-faster-nix-flake-check-improved-flake-locks-and-lazy-trees-rolled-out-to-20-of-users/66499/2

nixos-discourse · 2025-07-14T18:35:59Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/determinate-nix-3-8-0-changelog-a-faster-nix-flake-check-improved-flake-locks-and-lazy-trees-rolled-out-to-20-of-users/66499/5

nixos-discourse · 2025-09-10T17:09:36Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/dropping-upstream-nix-from-determinate-nix-installer/69181/13

nixos-discourse · 2025-09-10T17:43:06Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/dropping-upstream-nix-from-determinate-nix-installer/69181/16

nyabinary · 2025-10-09T00:50:47Z

Using randomness or even fingerprints for placeholders makes the language non-deterministic and impure.

Some things we can do, in approximate order of increasing cost/benefit:
* copy to the store on demand (and concurrently)

* have lazy path values

* have rope-structured strings with lazily computed fixed-length substrings to represent lazy hashing

* extend the language with weaker interfaces that perform better, e.g. as described in [#10689 (comment)](https://github.com/NixOS/nix/issues/10689#issuecomment-2835688644)
With those solutions in mind, I don't think we have to ruin the language semantics with any kind of "entropy", including fingerprints. Path equality and ordering (as observable to users) must remain identical, and we have plenty of options for making it lazier without sacrificing determinism or compatibility.

I assume this is why lazy tree won't get merged anytime in the near future since the approach needs to be revisited, correct?

xokdvium · 2025-10-09T01:05:30Z

I assume this is why lazy tree won't get merged anytime in the near future since the approach needs to be revisited, correct?

Correct. The most promising approach in my and @roberth's opinion on the matter is:

have rope-structured strings with lazily computed fixed-length substrings to represent lazy hashing

This needs some work on the internal representation of strings, but is certainly doable and intersects somewhat with the pascal strings rework that has been in the works for Lix and for lesser degree in CppNix. I hope to get around to tackling that at some point.

edolstra requested review from Ericson2314 and roberth as code owners May 17, 2025 21:33

github-actions bot added new-cli Relating to the "nix" command with-tests Issues related to testing. PRs with tests have some priority store Issues and pull requests concerning the Nix store fetching Networking with the outside (non-Nix) world, input locking labels May 17, 2025

edolstra mentioned this pull request May 17, 2025

Lazy trees #6530

Closed

9 tasks

Mic92 added this to Nix team May 18, 2025

github-project-automation bot moved this to To triage in Nix team May 18, 2025

This comment was marked as resolved.

Sign in to view

edolstra force-pushed the lazy-trees-v2 branch from c47b105 to c98026e Compare May 19, 2025 14:47

Mic92 mentioned this pull request May 20, 2025

Test Lazy trees with integration tests #13235

Open

roberth suggested changes May 21, 2025

View reviewed changes

Mic92 reviewed May 23, 2025

View reviewed changes

tomberek moved this from To triage to 🏁 Review in Nix team Jun 4, 2025

edolstra mentioned this pull request Jun 11, 2025

Run nixpkgsLibTests against lazy trees DeterminateSystems/nix-src#100

Merged

Aleksanaa modified the milestone: auto-allocate-uids stabilisation Jun 25, 2025

tomberek added the idea approved The given proposal has been discussed and approved by the Nix team. An implementation is welcome. label Jul 7, 2025

tomberek added this to the fetch-tree stabilisation milestone Jul 7, 2025

edolstra mentioned this pull request Sep 22, 2025

Fix fetchToStore caching #14050

Merged

Uh oh!

Uh oh!

Lazy trees v2 #13225

Are you sure you want to change the base?

Lazy trees v2 #13225

Uh oh!

Conversation

edolstra commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Context

Uh oh!

nixos-discourse commented May 18, 2025

Uh oh!

This comment was marked as resolved.

Mic92 commented May 20, 2025

Uh oh!

Mic92 commented May 20, 2025

Uh oh!

roberth left a comment

Choose a reason for hiding this comment

Uh oh!

Mic92 commented May 23, 2025

Uh oh!

Aleksanaa commented May 23, 2025

Uh oh!

Mic92 commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roberth commented May 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rszyma Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WeetHet Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nixos-discourse commented Jun 4, 2025

Uh oh!

nixos-discourse commented Jun 25, 2025

Uh oh!

nixos-discourse commented Jul 14, 2025

Uh oh!

nixos-discourse commented Jul 14, 2025

Uh oh!

nixos-discourse commented Sep 10, 2025

Uh oh!

nixos-discourse commented Sep 10, 2025

Uh oh!

nyabinary commented Oct 9, 2025

Uh oh!

xokdvium commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

edolstra commented May 17, 2025 •

edited

Loading

Mic92 commented May 23, 2025 •

edited

Loading

rszyma Jul 31, 2025 •

edited

Loading

WeetHet Aug 1, 2025 •

edited

Loading