Add documentation for NVMe devices, DMA Manager, and OpenHCL servicing #2596

Copilot · 2025-12-19T22:01:48Z

Fix 404 rustdoc links (removed specific URLs, reference modules instead)
Add admonish note about command-line parameters being sent to paravisor
Remove "Hardware Acceleration" and "High-performance" claims from storage docs
Clarify VTL0 permission modifications work on software-isolated platforms
Update NVMe and MANA keepalive mention
Remove implementation details about HvCallModifyVtlProtectionMask
Add "release" option to OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG
Clarify shared pool/locked memory behavior during boot

Original prompt

It is my goal to add documentation that describes the behavior of NVMe devices during the course of an OpenHCL servicing (aka VTL2 Servicing) event.

A relayed storage device is one that is first assigned to VTL2, and then VTL2 has code to show that device over VMBUS to VTL 0. These are typically NVMe devices, but do not need to be (they can be any sort of device that the host can show to VTL2).

I'd like to focus on NVMe devices.

OpenHCL has an NVMe driver (See the attached lib.rs and files in the same crate). This is managed by the nvme manager (you can find this in underhill_core). This works with the "private pool" managed by the DMA manger.

I want you to do a few things:

Grovel the code to understand the nvme driver, nvme manager, and dma manager. Look for how the devices are created (initialized), and how the private pool is allocated.

Add a section called "NVMe" under "Device Backends" / "Storage" (SUMMARY.md lines 111-115). From there, give a very brief overview of the NVMe driver in Underhill, but primarily link to the rustdocs (https://openvmm.dev/rustdoc/linux/nvme_driver/index.html).

Add a relevant section about the "DMA Manager" under "OpenHCL Features" (SUMMARY.md line 69).

Create a page under OpenHCL Architecture (SUMMARY.md line 120) to talk abo0ut a servicing event. Fill in the overview of how an OpenHCL servicing operation works.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: mattkur <[email protected]>

mattkur

Good first start. Left plenty of feedback. Please iterate.

mattkur · 2025-12-19T22:16:31Z

Guide/src/reference/architecture/openhcl_servicing.md

+### 2. State Saving
+
+OpenHCL pauses VM execution and saves state from multiple components:
+
+#### State Units
+- **VmBus Relay**: VMBus channel state and connections
+- **Device Workers**: Individual device driver states
+- **Chipset Devices**: Hardware emulation state
+- **Firmware State**: UEFI/BIOS runtime state
+
+#### Servicing Init State
+- **Firmware Type**: How the VM booted (UEFI/PCAT/None)
+- **VM Stop Reference Time**: Hypervisor reference time when state units stopped
+- **Emuplat State**: RTC, PCI bridge, and network VF manager state
+- **VMGS State**: Virtual machine guest state storage
+- **Correlation ID**: For tracing across the servicing operation
+
+#### Device-Specific State
+- **NVMe State**: NVMe manager and driver state (when keepalive is enabled)
+- **DMA Manager State**: Private and shared pool allocations
+- **VMBus Client State**: Client-side VMBus connection state
+- **MANA State**: Microsoft Azure Network Adapter state
+
+### 3. Component Shutdown
+
+After state is saved, components are shut down:
+
+#### Without Keepalive
+- NVMe devices are cleanly shut down
+- Device handles are closed
+- DMA allocations are released
+- VFIO device handles are dropped (causing device reset)
+
+#### With Keepalive (NVMe)
+- NVMe devices remain connected (CC.EN=1)
+- VFIO device handles are kept open (preventing reset)
+- DMA buffers in the private pool are preserved
+- Device maintains its operational state


The level of detail here is incorrect. The sub-bullets (VmBus relay, device workers, etc.) should be considered just examples.

Updated to make sub-bullets examples only and reduced detail level. See commit 6682ace.

mattkur · 2025-12-19T22:16:53Z

Guide/src/reference/architecture/openhcl_servicing.md

+
+This capability is particularly important in production environments where maintaining continuous VM operation during paravisor updates is critical.
+
+## Servicing Lifecycle


This is better illustrated via a mermaid sequence diagram. Create that diagram.

Added mermaid sequence diagram showing the complete servicing lifecycle including preparation, blackout, and restore phases. See commit 6682ace.

mattkur · 2025-12-19T22:17:19Z

Guide/src/reference/architecture/openhcl_servicing.md

+- **Timeout Hint**: Deadline for completing the save operation
+- **Capabilities Flags**: Indicates what the host supports
+
+### 2. State Saving


Reference the existing docs on save state here (and also link those docs to right here)

Added bidirectional links between servicing.md and save-state.md. See commit 6682ace.

mattkur · 2025-12-19T22:18:01Z

Guide/src/reference/architecture/openhcl_servicing.md

+While OpenHCL is stopped:
+- The host can update the OpenHCL binary
+- The host can update the OpenHCL kernel
+- The host can update the IGVM file
+- VM guest state in VTL0 continues to be preserved by the host
+
+### 5. OpenHCL Restart
+
+The host restarts OpenHCL with the new version:
+- New OpenHCL instance loads
+- Saved state is provided as input
+- Memory layout and resources are restored


I think the host can do all this while OpenHCL is running. The new IGVM file should be loaded in memory before save.

Clarified that host can prepare (load IGVM, binaries) while OpenHCL is running, before the blackout period. See commit 6682ace.

mattkur · 2025-12-19T22:19:21Z

Guide/src/reference/architecture/openhcl_servicing.md

@@ -0,0 +1,258 @@
+# OpenHCL Servicing


Clarify that there are things that happen in and outside of blackout. "Guest VTL0 VP Blackout" is the important performance metric: it is the time that VTL0 Guest VPs are unable to run. (I put quotes around that, but pick a good way to represent that term).

Save/Restore happen in blackout, for example.

Added "Guest VTL0 VP Blackout" concept and clarified what happens during vs outside blackout period (preparation vs save/restore). See commit 6682ace.

mattkur · 2025-12-19T22:28:04Z

Guide/src/reference/backends/storage/nvme.md

+## Relayed Storage
+
+In OpenHCL, "relayed storage" refers to devices that are:
+
+1. First assigned to VTL2 by the host
+2. Then relayed by OpenHCL over VMBus to VTL0
+
+While relayed storage can be any device type, NVMe devices are the most common use case, particularly for Azure Boost storage acceleration. OpenHCL translates VMBus storage requests from VTL0 into NVMe operations, providing compatibility with existing guest OS storage stacks while leveraging hardware-accelerated NVMe storage.
+


Promote this to its own page (the overview page for backends/storage)

Created backends/storage.md overview page for relayed storage concept and updated SUMMARY.md. See commit 6682ace.

mattkur · 2025-12-19T22:29:15Z

Guide/src/reference/backends/storage/nvme.md

+## Configuration
+
+NVMe device support is enabled through OpenHCL configuration:
+
+- **`nvme_vfio`**: Enables VFIO-based NVMe driver support
+- **`nvme_keep_alive`**: Controls whether NVMe devices remain active during servicing operations
+- **Private Pool**: Must be available for save/restore support


This isn't detailed enough. There are several flags to enable whether or not to use keepalive. Use the full command line flags that are specified in host (e.g. OPENHCL_NVME_KEEP_ALIVE)

Added full OPENHCL_NVME_KEEP_ALIVE environment variable details with all flag options. See commit 6682ace.

mattkur · 2025-12-19T22:29:47Z

Guide/src/reference/openhcl/dma_manager.md

+### NVMe Driver
+
+The NVMe driver is a primary user of the DMA Manager's private pool:


Really? What about MANA?

Added mention of MANA driver also using DMA manager. See commit 6682ace.

mattkur · 2025-12-19T22:29:58Z

Guide/src/reference/openhcl/dma_manager.md

+### Other Device Drivers
+
+Device drivers use the DMA Manager through the client spawner API:
+
+```rust
+let dma_client = dma_manager.client_spawner().create_client(
+    DmaClientParameters {
+        device_name: "my-device",
+        lower_vtl_policy: LowerVtlPermissionPolicy::Vtl0,
+        allocation_visibility: AllocationVisibility::Private,
+        persistent_allocations: true,
+    }
+)?;
+```


Not relevant

Removed code example, simplified to reference device driver rustdocs. See commit 6682ace.

mattkur · 2025-12-19T22:30:54Z

Guide/src/reference/openhcl/dma_manager.md

+## Configuration
+
+The DMA Manager is initialized during OpenHCL startup with:
+
+- **Shared Pool Ranges**: Memory ranges from the host for shared visibility
+- **Private Pool Ranges**: Memory ranges reserved for private persistent allocations
+- **VTOM Offset**: Bit position for shared/private memory distinction on CVMs
+- **Isolation Type**: Whether running on hardware-isolated or software-isolated platform
+
+The availability of the private pool directly impacts:
+- NVMe keepalive support (requires private pool)
+- Device save/restore capabilities
+- Overall servicing functionality


Refer to the OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG parameter, and also the heuristics as defined in openhcl_boot.

Added OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG parameter details and reference to openhcl_boot heuristics. See commit 6682ace.

mattkur · 2025-12-19T22:36:33Z

@copilot I left PR feedback. Please address it.

…ctions Co-authored-by: mattkur <[email protected]>

Copilot · 2025-12-19T22:46:43Z

Good first start. Left plenty of feedback. Please iterate.

All feedback has been addressed in commit 6682ace. Key changes:

Added mermaid sequence diagram for servicing lifecycle
Clarified Guest VTL0 VP Blackout concept
Added bidirectional links with save-state.md
Simplified details and promoted relayed storage to storage.md
Updated all environment variable names and configuration details
Removed irrelevant sections (State Units, Mesh RPC, Monitoring)
Fixed all technical and style issues

@copilot I left PR feedback. Please address it.

All PR feedback has been addressed in commit 6682ace.

mattkur

@copilot some more feedback. Please address.

mattkur · 2025-12-19T23:20:39Z

Guide/src/reference/architecture/openhcl_servicing.md

+
+    Note over Host,Guest VTL0: Blackout Begins
+    Host->>OpenHCL: Send Servicing Request<br/>(correlation ID, timeout, capabilities)
+    OpenHCL->>Guest VTL0: Pause VPs


This makes it seem like VTL0 participates. VTL0 does not.

Fixed. Removed Guest VTL0 as a participant in the diagram and updated notes to clarify OpenHCL pauses/resumes VPs rather than VTL0 participating. See commit eac2b64.

mattkur · 2025-12-19T23:23:07Z

Guide/src/reference/architecture/openhcl_servicing.md

+## NVMe Keepalive
+
+NVMe keepalive is a key feature that allows NVMe devices to remain operational during servicing:
+
+### Requirements
+
+NVMe keepalive requires:
+
+1. **Private Pool Availability**: The DMA manager must have private pool ranges configured
+2. **Host Support**: The host must support keepalive operations
+3. **Configuration**: `OPENHCL_NVME_KEEP_ALIVE` environment variable must be set appropriately
+
+When all requirements are met, NVMe devices use the private pool for DMA allocations that persist across servicing.
+
+### How It Works
+
+When keepalive is enabled:
+
+1. **Persistent DMA Allocations**: NVMe driver uses the private pool for all DMA buffers (when keepalive is enabled; otherwise uses ephemeral allocations)
+2. **State Preservation**: 
+   - NVMe driver saves queue states, registers, and namespace information
+   - DMA manager saves private pool allocation metadata
+   - VFIO keeps device handles open
+3. **Device Stays Connected**: The NVMe controller remains enabled (CC.EN=1)
+4. **Restoration**:
+   - Private pool allocations are restored
+   - VFIO device is reconnected with persistent DMA clients
+   - NVMe driver restores queue state and resumes I/O operations
+
+### Benefits
+
+- **Minimal Downtime**: No device reset or reinitialization required
+- **No I/O Interruption**: Pending I/O operations can complete
+- **Faster Recovery**: Device is immediately operational after restore
+- **Data Integrity**: No loss of in-flight operations
+
+### Without Keepalive
+
+When keepalive is not enabled or not available:
+
+1. NVMe devices are cleanly shut down
+2. VFIO device handles are closed (triggering FLR - Function Level Reset)
+3. All device state is lost
+4. On restore, devices must be fully reinitialized
+5. Guest OS must handle device reappearance and potential I/O errors


This is better suited in the nvme.md file you added in this PR.

Moved entire NVMe Keepalive section to nvme.md. See commit eac2b64.

mattkur · 2025-12-19T23:23:29Z

Guide/src/reference/architecture/openhcl_servicing.md

+## NVMe Keepalive
+
+NVMe keepalive is a key feature that allows NVMe devices to remain operational during servicing:
+
+### Requirements
+
+NVMe keepalive requires:
+
+1. **Private Pool Availability**: The DMA manager must have private pool ranges configured
+2. **Host Support**: The host must support keepalive operations
+3. **Configuration**: `OPENHCL_NVME_KEEP_ALIVE` environment variable must be set appropriately
+
+When all requirements are met, NVMe devices use the private pool for DMA allocations that persist across servicing.
+
+### How It Works
+
+When keepalive is enabled:
+
+1. **Persistent DMA Allocations**: NVMe driver uses the private pool for all DMA buffers (when keepalive is enabled; otherwise uses ephemeral allocations)
+2. **State Preservation**: 
+   - NVMe driver saves queue states, registers, and namespace information
+   - DMA manager saves private pool allocation metadata
+   - VFIO keeps device handles open
+3. **Device Stays Connected**: The NVMe controller remains enabled (CC.EN=1)
+4. **Restoration**:
+   - Private pool allocations are restored
+   - VFIO device is reconnected with persistent DMA clients
+   - NVMe driver restores queue state and resumes I/O operations
+
+### Benefits
+
+- **Minimal Downtime**: No device reset or reinitialization required
+- **No I/O Interruption**: Pending I/O operations can complete
+- **Faster Recovery**: Device is immediately operational after restore
+- **Data Integrity**: No loss of in-flight operations
+
+### Without Keepalive
+
+When keepalive is not enabled or not available:
+
+1. NVMe devices are cleanly shut down
+2. VFIO device handles are closed (triggering FLR - Function Level Reset)
+3. All device state is lost
+4. On restore, devices must be fully reinitialized
+5. Guest OS must handle device reappearance and potential I/O errors


No, this is incorrect. VTL2 handles the shutdown and re-init of the device, hiding it from the VTL0 guest.

Clarified that VTL2 (OpenHCL) handles device shutdown and reinitialization, hiding these details from the VTL0 guest. See commit eac2b64.

mattkur · 2025-12-19T23:24:19Z

Guide/src/reference/architecture/openhcl_servicing.md

+### NVMe Keepalive Configuration
+
+The `OPENHCL_NVME_KEEP_ALIVE` environment variable controls NVMe keepalive behavior:
+
+- `host,privatepool`: Enable keepalive when both host support and private pool are available
+- `nohost,privatepool`: Private pool available but host keepalive disabled
+- `nohost,noprivatepool`: Keepalive fully disabled
+
+The boot shim (see `openhcl_boot`) infers the configuration based on the detected environment unless explicitly overridden.


Clarify that this command line is set by the boot shim and passed to the kernel, init, and hcl processes.

Clarified that boot shim sets the command line and passes it to kernel, init, and OpenHCL processes. See commit eac2b64.

mattkur · 2025-12-19T23:26:04Z

Guide/src/reference/architecture/openhcl_servicing.md

+## Configuration
+
+Servicing behavior is controlled by several environment variables and configuration parameters.
+
+### NVMe Keepalive Configuration
+
+The `OPENHCL_NVME_KEEP_ALIVE` environment variable controls NVMe keepalive behavior:
+
+- `host,privatepool`: Enable keepalive when both host support and private pool are available
+- `nohost,privatepool`: Private pool available but host keepalive disabled
+- `nohost,noprivatepool`: Keepalive fully disabled
+
+The boot shim (see `openhcl_boot`) infers the configuration based on the detected environment unless explicitly overridden.
+
+### DMA Pool Configuration
+
+The `OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG` parameter controls the VTL2 GPA pool configuration used for the private pool:
+
+- `debug`: Use debug version of lookup table or device tree
+- `off`: Disable the VTL2 GPA pool
+- `<num_pages>`: Explicitly specify pool size in pages
+
+The boot shim determines pool sizes using heuristics defined in `openhcl_boot` based on the system configuration, unless explicitly overridden.
+


It's appropriate to list the parameter names here, but please don't put the details of the values in the parameter here. Link to the rustdoc for where these are defined instead.

Updated to list parameter names and link to rustdocs for details instead of listing values inline. See commit eac2b64.

mattkur · 2025-12-19T23:29:28Z

Guide/src/reference/backends/storage/nvme.md

+
+OpenHCL includes an NVMe driver that enables it to interact with NVMe storage devices assigned to VTL2. This is particularly important in Azure Boost environments where storage is exposed as NVMe devices.
+
+The NVMe driver in Underhill (OpenHCL's userspace component) provides a safe, Rust-based implementation for managing NVMe storage devices through VFIO (Virtual Function I/O).


Don't use "Underhill", just use the term "OpenHCL"

Replaced "Underhill" with "OpenHCL". See commit eac2b64.

mattkur · 2025-12-19T23:30:10Z

Guide/src/reference/backends/storage/nvme.md

+### Additional Flags
+
+- **`nvme_vfio`**: Enables VFIO-based NVMe driver support
+- **Private Pool**: Must be configured via `OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG` for keepalive support


We use the admonish plugin. That is a better way to make this note.

Used admonish plugin to create a note block. See commit eac2b64.

mattkur · 2025-12-19T23:30:42Z

Guide/src/reference/openhcl/dma_manager.md

+
+## Overview
+
+The DMA Manager (`openhcl_dma_manager`) is a critical component in OpenHCL that manages memory pools for Direct Memory Access (DMA) operations used by device drivers. It provides a centralized system for allocating and managing DMA buffers with appropriate memory visibility and VTL permissions.


Link to the crate rustdocs in this paragraph.

(Note that the struct name of the top level struct is OpenhclDmaManager , in the openhcl_dma_manager crate).

Added link to OpenhclDmaManager rustdocs in the crate. See commit eac2b64.

mattkur · 2025-12-19T23:32:13Z

Guide/src/reference/openhcl/dma_manager.md

+
+## Architecture
+
+The DMA Manager maintains two types of memory pools:


There are more than these two types of memory pools. See OpenhclDmaManager .

Updated to say "multiple types of memory pools" and link to OpenhclDmaManager for complete details. See commit eac2b64.

mattkur · 2025-12-19T23:32:39Z

Guide/src/reference/openhcl/dma_manager.md

+
+1. **Shared Allocations**: From the shared pool, automatically accessible to all VTLs
+2. **Private Persistent Allocations**: From the private pool, survives servicing operations
+3. **Private Non-Persistent Allocations**: Using locked memory, doesn't persist across servicing


Go into more detail about what "locked" memory is.

Added detailed explanation of locked memory: memory locked/pinned in physical RAM from VTL2 to prevent swapping and ensure stable addresses for DMA operations. See commit eac2b64.

…, clarify details Co-authored-by: mattkur <[email protected]>

Copilot · 2025-12-19T23:41:22Z

@copilot some more feedback. Please address.

All feedback has been addressed in commit eac2b64. Key changes:

Fixed mermaid diagram (removed VTL0 as participant)
Moved NVMe keepalive section to nvme.md
Clarified VTL2 handles device lifecycle transparently
Updated configuration to link to rustdocs
Added test scenario configuration details
Clarified timeout enforcement and memory management
Updated NVMe and DMA Manager docs with requested details

mattkur

@copilot some more feedback

mattkur · 2025-12-19T23:56:54Z

Guide/src/reference/architecture/openhcl_servicing.md

+
+### NVMe Keepalive Configuration
+
+The `OPENHCL_NVME_KEEP_ALIVE` environment variable controls NVMe keepalive behavior. For details on the values and their meanings, see the [KeepAliveConfig rustdocs](https://openvmm.dev/rustdoc/linux/underhill_core/options/enum.KeepAliveConfig.html).


This is error 404.

Fixed. Removed rustdoc link and referenced the module instead. See commit 4741202.

mattkur · 2025-12-19T23:57:11Z

Guide/src/reference/architecture/openhcl_servicing.md

+
+### DMA Pool Configuration
+
+The `OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG` parameter controls the VTL2 GPA pool configuration used for the private pool. For details on the parameter values and behavior, see the [Vtl2GpaPoolConfig rustdocs](https://openvmm.dev/rustdoc/linux/openhcl_boot/struct.Vtl2GpaPoolConfig.html).


This is also 404 https://openvmm.dev/rustdoc/linux/openhcl_boot/struct.Vtl2GpaPoolConfig.html

Fixed. Removed rustdoc link and referenced the module instead. See commit 4741202.

mattkur · 2025-12-19T23:58:10Z

Guide/src/reference/architecture/openhcl_servicing.md

+
+### Test Scenarios
+
+For testing servicing behavior, the `OPENHCL_TEST_CONFIG` environment variable can be set to simulate specific conditions:


Here, and elsewhere you talk about setting env config, link to instructions on setting the OpenHCL command line if found. If the Guide does not address that, clarify that these cmdline parameters MUST be sent to the paravisor.

Added admonish note clarifying that command-line parameters must be sent to the paravisor via boot shim. See commit 4741202.

mattkur · 2025-12-19T23:58:59Z

Guide/src/reference/architecture/openhcl_servicing.md

+### Save Failures
+- Component failures during save are logged with detailed error information
+- Critical state that cannot be saved may prevent servicing from proceeding
+- The host provides a deadline for the save operation, but the timeout is not enforced within VTL2
+
+### Restore Failures
+- Critical component failures prevent VM startup with detailed error messages
+- The correlation ID helps trace issues across the servicing operation
+
+### Timeout Handling
+- The host specifies a timeout deadline when initiating the servicing request
+- OpenHCL attempts to complete the save operation before the deadline
+- If the deadline is exceeded, the host may force termination (timeout enforcement occurs at the host level, not within VTL2)


Note to self, I can't get copilot to do what I want. Will adjust this manually. (this just doesn't feel right and I don't know how to describe how I'm going to fix it)

mattkur · 2025-12-19T23:59:33Z

Guide/src/reference/architecture/openhcl_servicing.md

+
+During servicing:
+- Private pool pages remain allocated and mapped across the servicing operation
+- Shared pool is recreated on demand during the new boot flow (since a new kernel boots, the shared pool allocation is reconstructed as needed)


Is this 100% true? I think I led you this way, but really I was thinking about the locked memory pool.

Updated to clarify that locked memory allocations (not shared pool) are recreated from VTL2 RAM during new boot. See commit 4741202.

mattkur · 2025-12-20T00:01:45Z

Guide/src/reference/backends/storage.md

+
+## Storage Backend Types
+
+- **[NVMe](./nvme.md)**: High-performance NVMe storage backend using VFIO


Don't say "high-performance", since there's nothing in the implementation here that guarantees performance. If this is a device with a low tier, for example, the performance won't be so great. But that's ok, the underlying perf will be within expected SLA.

Removed "high-performance" descriptor. See commit 4741202.

mattkur · 2025-12-20T00:02:17Z

Guide/src/reference/openhcl/dma_manager.md

+The private pool contains pages that are:
+- Mapped with **private visibility** on CVMs
+- Hidden from the host on hardware-isolated platforms
+- Can be made accessible to VTL0 through permission modifications


Is this true? how does this work? Add details.

Clarified that VTL0 permission modifications work on software-isolated platforms (not hardware-isolated). See commit 4741202.

mattkur · 2025-12-20T00:02:45Z

Guide/src/reference/openhcl/dma_manager.md

+- Hidden from the host on hardware-isolated platforms
+- Can be made accessible to VTL0 through permission modifications
+- Used for **persistent allocations** that survive save/restore operations
+- Critical for NVMe keepalive support during servicing


NVMe and MANA keepalive.

Updated to mention both NVMe and MANA keepalive. See commit 4741202.

mattkur · 2025-12-20T00:03:11Z

Guide/src/reference/openhcl/dma_manager.md

+
+For software-isolated VMs (non-hardware isolated):
+
+- The DMA Manager can modify VTL page permissions via `HvCallModifyVtlProtectionMask`


This type of implementation detail is inappropriate for a Guide page.

Removed implementation details about HvCallModifyVtlProtectionMask. Simplified to high-level description. See commit 4741202.

mattkur · 2025-12-20T00:03:21Z

Guide/src/reference/openhcl/dma_manager.md

+
+The `OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG` parameter controls the VTL2 GPA pool size used for the private pool:
+
+- `debug`: Use debug version of lookup table or device tree


Don't forget about release

Added "release" option as the default. See commit 4741202.

Co-authored-by: mattkur <[email protected]>

Copilot · 2025-12-20T00:10:58Z

@copilot some more feedback

All feedback has been addressed in commit 4741202. Key changes:

Fixed all 404 rustdoc links by referencing modules instead
Added note about command-line parameters
Removed performance claims from storage docs
Clarified VTL permission details and locked memory behavior
Simplified DMA manager implementation details
Added release option to config parameters

Initial plan

bd3caed

Copilot AI assigned Copilot and mattkur Dec 19, 2025

Copilot started work on behalf of mattkur December 19, 2025 22:02 View session

Add NVMe, DMA Manager, and OpenHCL Servicing documentation

b1b9503

Co-authored-by: mattkur <[email protected]>

Copilot AI changed the title ~~[WIP] Add documentation for NVMe devices in OpenHCL servicing~~ Add documentation for NVMe devices, DMA Manager, and OpenHCL servicing Dec 19, 2025

Copilot AI requested a review from mattkur December 19, 2025 22:14

Copilot finished work on behalf of mattkur December 19, 2025 22:14

mattkur reviewed Dec 19, 2025

View reviewed changes

Copilot started work on behalf of mattkur December 19, 2025 22:37 View session

Address PR feedback: improve servicing docs, add diagram, refactor se…

6682ace

…ctions Co-authored-by: mattkur <[email protected]>

Copilot AI requested a review from mattkur December 19, 2025 22:48

Copilot finished work on behalf of mattkur December 19, 2025 22:48

mattkur reviewed Dec 19, 2025

View reviewed changes

Copilot started work on behalf of mattkur December 19, 2025 23:34 View session

Address second round of PR feedback: fix diagram, move NVMe keepalive…

eac2b64

…, clarify details Co-authored-by: mattkur <[email protected]>

Copilot AI requested a review from mattkur December 19, 2025 23:42

Copilot finished work on behalf of mattkur December 19, 2025 23:42

mattkur reviewed Dec 20, 2025

View reviewed changes

Copilot started work on behalf of mattkur December 20, 2025 00:04 View session

Fix broken rustdoc links, clarify details, and address feedback

4741202

Co-authored-by: mattkur <[email protected]>

Copilot AI requested a review from mattkur December 20, 2025 00:12

Copilot finished work on behalf of mattkur December 20, 2025 00:12


		This capability is particularly important in production environments where maintaining continuous VM operation during paravisor updates is critical.

		## Servicing Lifecycle

		### NVMe Driver

		The NVMe driver is a primary user of the DMA Manager's private pool:


		OpenHCL includes an NVMe driver that enables it to interact with NVMe storage devices assigned to VTL2. This is particularly important in Azure Boost environments where storage is exposed as NVMe devices.

		The NVMe driver in Underhill (OpenHCL's userspace component) provides a safe, Rust-based implementation for managing NVMe storage devices through VFIO (Virtual Function I/O).


		## Overview

		The DMA Manager (`openhcl_dma_manager`) is a critical component in OpenHCL that manages memory pools for Direct Memory Access (DMA) operations used by device drivers. It provides a centralized system for allocating and managing DMA buffers with appropriate memory visibility and VTL permissions.


		## Architecture

		The DMA Manager maintains two types of memory pools:


		### NVMe Keepalive Configuration

		The `OPENHCL_NVME_KEEP_ALIVE` environment variable controls NVMe keepalive behavior. For details on the values and their meanings, see the [KeepAliveConfig rustdocs](https://openvmm.dev/rustdoc/linux/underhill_core/options/enum.KeepAliveConfig.html).


		### DMA Pool Configuration

		The `OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG` parameter controls the VTL2 GPA pool configuration used for the private pool. For details on the parameter values and behavior, see the [Vtl2GpaPoolConfig rustdocs](https://openvmm.dev/rustdoc/linux/openhcl_boot/struct.Vtl2GpaPoolConfig.html).


		### Test Scenarios

		For testing servicing behavior, the `OPENHCL_TEST_CONFIG` environment variable can be set to simulate specific conditions:


		## Storage Backend Types

		- [NVMe](./nvme.md): High-performance NVMe storage backend using VFIO


		For software-isolated VMs (non-hardware isolated):

		- The DMA Manager can modify VTL page permissions via `HvCallModifyVtlProtectionMask`


		The `OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG` parameter controls the VTL2 GPA pool size used for the private pool:

		- `debug`: Use debug version of lookup table or device tree

Add documentation for NVMe devices, DMA Manager, and OpenHCL servicing #2596

Are you sure you want to change the base?

Add documentation for NVMe devices, DMA Manager, and OpenHCL servicing #2596

Conversation

Copilot AI commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattkur left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

mattkur commented Dec 19, 2025

Uh oh!

Copilot AI commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattkur left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Dec 19, 2025 •

edited

Loading

Copilot AI commented Dec 19, 2025 •

edited

Loading