-
Notifications
You must be signed in to change notification settings - Fork 52
Description
What / Why
Currently the cleanupCached() method in lib/fetcher.js seen here: https://github.com/npm/pacote/blob/latest/lib/fetcher.js#L323
Deletes content directly from this.cache which is the cache directory configured by npm and is shared across other modules, this is especially relevant for make-fetch-happen. When we remove content that could've been written by someone else, we're putting the other module in a position that it can have an up to date and correct index entry that directs to content that isn't there. Granted, make-fetch-happen should be more resilient to this, but I think the correct fix here is to be a little smarter about how pacote handles the cache.
Using the example of make-fetch-happen, data is already in cacache and will already be read from there. Right now pacote will read from the cache, create a new integrity stream to verify the data again (cacache already did this when it read the original content), and write back to the same cache while also returning the data to the user. This is an extra write and hash that we absolutely do not need to do, and it occurs for every single file that is retrieved through the RemoteFetcher and RegistryFetcher classes.
This module should be modified such that it only uses cacache for subclasses that are not already cached by their own means, which today is file, dir, and git. Arguably we shouldn't bother caching any of these things at the pacote level since it's reasonably likely git state will change without us knowning, and copying a file (or packing a directory and copying the result) to cacache isn't really of much benefit.
This is related to npm/arborist#297