-
Notifications
You must be signed in to change notification settings - Fork 28
Description
there are a few ways to deallocate a block in an NVMe device, which the NVM Command Set Specification talks a bit about:
A logical block may be marked deallocated as the result of:
- a Dataset Management command (refer to section 3.2.3); or
- a Write Zeroes command addressing the logical block (refer to section 3.2.3.2.1); or
- a sanitize operation.
in the limit it would be nice to plumb this through to Crucible and the OS for local volumes, as we're unaware of the filesystem-specifics happening in the guest's volume. but a starting point would be for the emulated NVMe controller to at least support whatever the common mechanism is that OSes use to deallocate blocks - I assume Dataset Management with AD set is more common than Write Zeroes with DEAC, but I haven't looked around. probably comes with setting the bits and limits across ONCS/WZSL/DM{RS,SL,RSL} as appropriate. I doubt Sanitize is too interesting right now.
NVM Command Set Specification also talks about the semantics of deallocated blocks in 3.2.3.2.1 Deallocated or Unwritten Logical Blocks:
The value read from a deallocated logical block shall be deterministic; specifically, the value returned by subsequent reads of that logical block shall be the same until a write operation occurs to that logical block.
over in namespace features, DLFEAT describes support for Write Zeroes' deallocate bit and the behavior of deallocated logical blocks. It seems like we could reasonably deallocate blocks in general by setting DLFEAT=000b and ignoring deallocate for now, or DLFEAT=001b and write zeroes on the deallocated block. plumbing this through to Crucible seems a little more interesting, and plumbing this through to file backends (specifically for local volumes) has the interesting consequence of exposing the guest to the hardware's value for DLFEAT. in the local volume case we're already not thinking about migration, so maybe that's OK.
in theory this could help keep with performance and wear-leveling especially when local volumes are getting used, but I haven't gotten anywhere near testing any of that. a no-op Dataset Management might make sense even without numbers just so adding support for the behavior later isn't as much of a guest surprise.