Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 19, 2025

  • Fix 404 rustdoc links (removed specific URLs, reference modules instead)
  • Add admonish note about command-line parameters being sent to paravisor
  • Remove "Hardware Acceleration" and "High-performance" claims from storage docs
  • Clarify VTL0 permission modifications work on software-isolated platforms
  • Update NVMe and MANA keepalive mention
  • Remove implementation details about HvCallModifyVtlProtectionMask
  • Add "release" option to OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG
  • Clarify shared pool/locked memory behavior during boot
Original prompt

It is my goal to add documentation that describes the behavior of NVMe devices during the course of an OpenHCL servicing (aka VTL2 Servicing) event.

A relayed storage device is one that is first assigned to VTL2, and then VTL2 has code to show that device over VMBUS to VTL 0. These are typically NVMe devices, but do not need to be (they can be any sort of device that the host can show to VTL2).

I'd like to focus on NVMe devices.

OpenHCL has an NVMe driver (See the attached lib.rs and files in the same crate). This is managed by the nvme manager (you can find this in underhill_core). This works with the "private pool" managed by the DMA manger.

I want you to do a few things:

  1. Grovel the code to understand the nvme driver, nvme manager, and dma manager. Look for how the devices are created (initialized), and how the private pool is allocated.

  2. Add a section called "NVMe" under "Device Backends" / "Storage" (SUMMARY.md lines 111-115). From there, give a very brief overview of the NVMe driver in Underhill, but primarily link to the rustdocs (https://openvmm.dev/rustdoc/linux/nvme_driver/index.html).

  3. Add a relevant section about the "DMA Manager" under "OpenHCL Features" (SUMMARY.md line 69).

  4. Create a page under OpenHCL Architecture (SUMMARY.md line 120) to talk abo0ut a servicing event. Fill in the overview of how an OpenHCL servicing operation works.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Add documentation for NVMe devices in OpenHCL servicing Add documentation for NVMe devices, DMA Manager, and OpenHCL servicing Dec 19, 2025
Copilot AI requested a review from mattkur December 19, 2025 22:14
Copy link
Contributor

@mattkur mattkur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good first start. Left plenty of feedback. Please iterate.

Comment on lines 20 to 57
### 2. State Saving

OpenHCL pauses VM execution and saves state from multiple components:

#### State Units
- **VmBus Relay**: VMBus channel state and connections
- **Device Workers**: Individual device driver states
- **Chipset Devices**: Hardware emulation state
- **Firmware State**: UEFI/BIOS runtime state

#### Servicing Init State
- **Firmware Type**: How the VM booted (UEFI/PCAT/None)
- **VM Stop Reference Time**: Hypervisor reference time when state units stopped
- **Emuplat State**: RTC, PCI bridge, and network VF manager state
- **VMGS State**: Virtual machine guest state storage
- **Correlation ID**: For tracing across the servicing operation

#### Device-Specific State
- **NVMe State**: NVMe manager and driver state (when keepalive is enabled)
- **DMA Manager State**: Private and shared pool allocations
- **VMBus Client State**: Client-side VMBus connection state
- **MANA State**: Microsoft Azure Network Adapter state

### 3. Component Shutdown

After state is saved, components are shut down:

#### Without Keepalive
- NVMe devices are cleanly shut down
- Device handles are closed
- DMA allocations are released
- VFIO device handles are dropped (causing device reset)

#### With Keepalive (NVMe)
- NVMe devices remain connected (CC.EN=1)
- VFIO device handles are kept open (preventing reset)
- DMA buffers in the private pool are preserved
- Device maintains its operational state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The level of detail here is incorrect. The sub-bullets (VmBus relay, device workers, etc.) should be considered just examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to make sub-bullets examples only and reduced detail level. See commit 6682ace.


This capability is particularly important in production environments where maintaining continuous VM operation during paravisor updates is critical.

## Servicing Lifecycle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better illustrated via a mermaid sequence diagram. Create that diagram.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added mermaid sequence diagram showing the complete servicing lifecycle including preparation, blackout, and restore phases. See commit 6682ace.

- **Timeout Hint**: Deadline for completing the save operation
- **Capabilities Flags**: Indicates what the host supports

### 2. State Saving
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference the existing docs on save state here (and also link those docs to right here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added bidirectional links between servicing.md and save-state.md. See commit 6682ace.

Comment on lines 61 to 72
While OpenHCL is stopped:
- The host can update the OpenHCL binary
- The host can update the OpenHCL kernel
- The host can update the IGVM file
- VM guest state in VTL0 continues to be preserved by the host

### 5. OpenHCL Restart

The host restarts OpenHCL with the new version:
- New OpenHCL instance loads
- Saved state is provided as input
- Memory layout and resources are restored
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the host can do all this while OpenHCL is running. The new IGVM file should be loaded in memory before save.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified that host can prepare (load IGVM, binaries) while OpenHCL is running, before the blackout period. See commit 6682ace.

@@ -0,0 +1,258 @@
# OpenHCL Servicing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that there are things that happen in and outside of blackout. "Guest VTL0 VP Blackout" is the important performance metric: it is the time that VTL0 Guest VPs are unable to run. (I put quotes around that, but pick a good way to represent that term).

Save/Restore happen in blackout, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "Guest VTL0 VP Blackout" concept and clarified what happens during vs outside blackout period (preparation vs save/restore). See commit 6682ace.

Comment on lines 38 to 46
## Relayed Storage

In OpenHCL, "relayed storage" refers to devices that are:

1. First assigned to VTL2 by the host
2. Then relayed by OpenHCL over VMBus to VTL0

While relayed storage can be any device type, NVMe devices are the most common use case, particularly for Azure Boost storage acceleration. OpenHCL translates VMBus storage requests from VTL0 into NVMe operations, providing compatibility with existing guest OS storage stacks while leveraging hardware-accelerated NVMe storage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Promote this to its own page (the overview page for backends/storage)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created backends/storage.md overview page for relayed storage concept and updated SUMMARY.md. See commit 6682ace.

Comment on lines 47 to 53
## Configuration

NVMe device support is enabled through OpenHCL configuration:

- **`nvme_vfio`**: Enables VFIO-based NVMe driver support
- **`nvme_keep_alive`**: Controls whether NVMe devices remain active during servicing operations
- **Private Pool**: Must be available for save/restore support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't detailed enough. There are several flags to enable whether or not to use keepalive. Use the full command line flags that are specified in host (e.g. OPENHCL_NVME_KEEP_ALIVE)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added full OPENHCL_NVME_KEEP_ALIVE environment variable details with all flag options. See commit 6682ace.

Comment on lines 58 to 60
### NVMe Driver

The NVMe driver is a primary user of the DMA Manager's private pool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? What about MANA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added mention of MANA driver also using DMA manager. See commit 6682ace.

Comment on lines 67 to 80
### Other Device Drivers

Device drivers use the DMA Manager through the client spawner API:

```rust
let dma_client = dma_manager.client_spawner().create_client(
DmaClientParameters {
device_name: "my-device",
lower_vtl_policy: LowerVtlPermissionPolicy::Vtl0,
allocation_visibility: AllocationVisibility::Private,
persistent_allocations: true,
}
)?;
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed code example, simplified to reference device driver rustdocs. See commit 6682ace.

Comment on lines 103 to 115
## Configuration

The DMA Manager is initialized during OpenHCL startup with:

- **Shared Pool Ranges**: Memory ranges from the host for shared visibility
- **Private Pool Ranges**: Memory ranges reserved for private persistent allocations
- **VTOM Offset**: Bit position for shared/private memory distinction on CVMs
- **Isolation Type**: Whether running on hardware-isolated or software-isolated platform

The availability of the private pool directly impacts:
- NVMe keepalive support (requires private pool)
- Device save/restore capabilities
- Overall servicing functionality
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to the OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG parameter, and also the heuristics as defined in openhcl_boot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG parameter details and reference to openhcl_boot heuristics. See commit 6682ace.

@mattkur
Copy link
Contributor

mattkur commented Dec 19, 2025

@copilot I left PR feedback. Please address it.

Copy link
Contributor Author

Copilot AI commented Dec 19, 2025

Good first start. Left plenty of feedback. Please iterate.

All feedback has been addressed in commit 6682ace. Key changes:

  • Added mermaid sequence diagram for servicing lifecycle
  • Clarified Guest VTL0 VP Blackout concept
  • Added bidirectional links with save-state.md
  • Simplified details and promoted relayed storage to storage.md
  • Updated all environment variable names and configuration details
  • Removed irrelevant sections (State Units, Mesh RPC, Monitoring)
  • Fixed all technical and style issues

@copilot I left PR feedback. Please address it.

All PR feedback has been addressed in commit 6682ace.

Copilot AI requested a review from mattkur December 19, 2025 22:48
Copy link
Contributor

@mattkur mattkur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot some more feedback. Please address.


Note over Host,Guest VTL0: Blackout Begins
Host->>OpenHCL: Send Servicing Request<br/>(correlation ID, timeout, capabilities)
OpenHCL->>Guest VTL0: Pause VPs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it seem like VTL0 participates. VTL0 does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed Guest VTL0 as a participant in the diagram and updated notes to clarify OpenHCL pauses/resumes VPs rather than VTL0 participating. See commit eac2b64.

Comment on lines 108 to 152
## NVMe Keepalive

NVMe keepalive is a key feature that allows NVMe devices to remain operational during servicing:

### Requirements

NVMe keepalive requires:

1. **Private Pool Availability**: The DMA manager must have private pool ranges configured
2. **Host Support**: The host must support keepalive operations
3. **Configuration**: `OPENHCL_NVME_KEEP_ALIVE` environment variable must be set appropriately

When all requirements are met, NVMe devices use the private pool for DMA allocations that persist across servicing.

### How It Works

When keepalive is enabled:

1. **Persistent DMA Allocations**: NVMe driver uses the private pool for all DMA buffers (when keepalive is enabled; otherwise uses ephemeral allocations)
2. **State Preservation**:
- NVMe driver saves queue states, registers, and namespace information
- DMA manager saves private pool allocation metadata
- VFIO keeps device handles open
3. **Device Stays Connected**: The NVMe controller remains enabled (CC.EN=1)
4. **Restoration**:
- Private pool allocations are restored
- VFIO device is reconnected with persistent DMA clients
- NVMe driver restores queue state and resumes I/O operations

### Benefits

- **Minimal Downtime**: No device reset or reinitialization required
- **No I/O Interruption**: Pending I/O operations can complete
- **Faster Recovery**: Device is immediately operational after restore
- **Data Integrity**: No loss of in-flight operations

### Without Keepalive

When keepalive is not enabled or not available:

1. NVMe devices are cleanly shut down
2. VFIO device handles are closed (triggering FLR - Function Level Reset)
3. All device state is lost
4. On restore, devices must be fully reinitialized
5. Guest OS must handle device reappearance and potential I/O errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better suited in the nvme.md file you added in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved entire NVMe Keepalive section to nvme.md. See commit eac2b64.

Comment on lines 108 to 152
## NVMe Keepalive

NVMe keepalive is a key feature that allows NVMe devices to remain operational during servicing:

### Requirements

NVMe keepalive requires:

1. **Private Pool Availability**: The DMA manager must have private pool ranges configured
2. **Host Support**: The host must support keepalive operations
3. **Configuration**: `OPENHCL_NVME_KEEP_ALIVE` environment variable must be set appropriately

When all requirements are met, NVMe devices use the private pool for DMA allocations that persist across servicing.

### How It Works

When keepalive is enabled:

1. **Persistent DMA Allocations**: NVMe driver uses the private pool for all DMA buffers (when keepalive is enabled; otherwise uses ephemeral allocations)
2. **State Preservation**:
- NVMe driver saves queue states, registers, and namespace information
- DMA manager saves private pool allocation metadata
- VFIO keeps device handles open
3. **Device Stays Connected**: The NVMe controller remains enabled (CC.EN=1)
4. **Restoration**:
- Private pool allocations are restored
- VFIO device is reconnected with persistent DMA clients
- NVMe driver restores queue state and resumes I/O operations

### Benefits

- **Minimal Downtime**: No device reset or reinitialization required
- **No I/O Interruption**: Pending I/O operations can complete
- **Faster Recovery**: Device is immediately operational after restore
- **Data Integrity**: No loss of in-flight operations

### Without Keepalive

When keepalive is not enabled or not available:

1. NVMe devices are cleanly shut down
2. VFIO device handles are closed (triggering FLR - Function Level Reset)
3. All device state is lost
4. On restore, devices must be fully reinitialized
5. Guest OS must handle device reappearance and potential I/O errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is incorrect. VTL2 handles the shutdown and re-init of the device, hiding it from the VTL0 guest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified that VTL2 (OpenHCL) handles device shutdown and reinitialization, hiding these details from the VTL0 guest. See commit eac2b64.

Comment on lines 164 to 172
### NVMe Keepalive Configuration

The `OPENHCL_NVME_KEEP_ALIVE` environment variable controls NVMe keepalive behavior:

- `host,privatepool`: Enable keepalive when both host support and private pool are available
- `nohost,privatepool`: Private pool available but host keepalive disabled
- `nohost,noprivatepool`: Keepalive fully disabled

The boot shim (see `openhcl_boot`) infers the configuration based on the detected environment unless explicitly overridden.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that this command line is set by the boot shim and passed to the kernel, init, and hcl processes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified that boot shim sets the command line and passes it to kernel, init, and OpenHCL processes. See commit eac2b64.

Comment on lines 160 to 183
## Configuration

Servicing behavior is controlled by several environment variables and configuration parameters.

### NVMe Keepalive Configuration

The `OPENHCL_NVME_KEEP_ALIVE` environment variable controls NVMe keepalive behavior:

- `host,privatepool`: Enable keepalive when both host support and private pool are available
- `nohost,privatepool`: Private pool available but host keepalive disabled
- `nohost,noprivatepool`: Keepalive fully disabled

The boot shim (see `openhcl_boot`) infers the configuration based on the detected environment unless explicitly overridden.

### DMA Pool Configuration

The `OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG` parameter controls the VTL2 GPA pool configuration used for the private pool:

- `debug`: Use debug version of lookup table or device tree
- `off`: Disable the VTL2 GPA pool
- `<num_pages>`: Explicitly specify pool size in pages

The boot shim determines pool sizes using heuristics defined in `openhcl_boot` based on the system configuration, unless explicitly overridden.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's appropriate to list the parameter names here, but please don't put the details of the values in the parameter here. Link to the rustdoc for where these are defined instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to list parameter names and link to rustdocs for details instead of listing values inline. See commit eac2b64.


OpenHCL includes an NVMe driver that enables it to interact with NVMe storage devices assigned to VTL2. This is particularly important in Azure Boost environments where storage is exposed as NVMe devices.

The NVMe driver in Underhill (OpenHCL's userspace component) provides a safe, Rust-based implementation for managing NVMe storage devices through VFIO (Virtual Function I/O).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use "Underhill", just use the term "OpenHCL"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced "Underhill" with "OpenHCL". See commit eac2b64.

### Additional Flags

- **`nvme_vfio`**: Enables VFIO-based NVMe driver support
- **Private Pool**: Must be configured via `OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG` for keepalive support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the admonish plugin. That is a better way to make this note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used admonish plugin to create a note block. See commit eac2b64.


## Overview

The DMA Manager (`openhcl_dma_manager`) is a critical component in OpenHCL that manages memory pools for Direct Memory Access (DMA) operations used by device drivers. It provides a centralized system for allocating and managing DMA buffers with appropriate memory visibility and VTL permissions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to the crate rustdocs in this paragraph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that the struct name of the top level struct is OpenhclDmaManager , in the openhcl_dma_manager crate).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added link to OpenhclDmaManager rustdocs in the crate. See commit eac2b64.


## Architecture

The DMA Manager maintains two types of memory pools:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more than these two types of memory pools. See OpenhclDmaManager .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to say "multiple types of memory pools" and link to OpenhclDmaManager for complete details. See commit eac2b64.


1. **Shared Allocations**: From the shared pool, automatically accessible to all VTLs
2. **Private Persistent Allocations**: From the private pool, survives servicing operations
3. **Private Non-Persistent Allocations**: Using locked memory, doesn't persist across servicing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go into more detail about what "locked" memory is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added detailed explanation of locked memory: memory locked/pinned in physical RAM from VTL2 to prevent swapping and ensure stable addresses for DMA operations. See commit eac2b64.

Copy link
Contributor Author

Copilot AI commented Dec 19, 2025

@copilot some more feedback. Please address.

All feedback has been addressed in commit eac2b64. Key changes:

  • Fixed mermaid diagram (removed VTL0 as participant)
  • Moved NVMe keepalive section to nvme.md
  • Clarified VTL2 handles device lifecycle transparently
  • Updated configuration to link to rustdocs
  • Added test scenario configuration details
  • Clarified timeout enforcement and memory management
  • Updated NVMe and DMA Manager docs with requested details

Copilot AI requested a review from mattkur December 19, 2025 23:42
Copy link
Contributor

@mattkur mattkur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot some more feedback


### NVMe Keepalive Configuration

The `OPENHCL_NVME_KEEP_ALIVE` environment variable controls NVMe keepalive behavior. For details on the values and their meanings, see the [KeepAliveConfig rustdocs](https://openvmm.dev/rustdoc/linux/underhill_core/options/enum.KeepAliveConfig.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is error 404.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed rustdoc link and referenced the module instead. See commit 4741202.


### DMA Pool Configuration

The `OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG` parameter controls the VTL2 GPA pool configuration used for the private pool. For details on the parameter values and behavior, see the [Vtl2GpaPoolConfig rustdocs](https://openvmm.dev/rustdoc/linux/openhcl_boot/struct.Vtl2GpaPoolConfig.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed rustdoc link and referenced the module instead. See commit 4741202.


### Test Scenarios

For testing servicing behavior, the `OPENHCL_TEST_CONFIG` environment variable can be set to simulate specific conditions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, and elsewhere you talk about setting env config, link to instructions on setting the OpenHCL command line if found. If the Guide does not address that, clarify that these cmdline parameters MUST be sent to the paravisor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added admonish note clarifying that command-line parameters must be sent to the paravisor via boot shim. See commit 4741202.

Comment on lines +145 to +157
### Save Failures
- Component failures during save are logged with detailed error information
- Critical state that cannot be saved may prevent servicing from proceeding
- The host provides a deadline for the save operation, but the timeout is not enforced within VTL2

### Restore Failures
- Critical component failures prevent VM startup with detailed error messages
- The correlation ID helps trace issues across the servicing operation

### Timeout Handling
- The host specifies a timeout deadline when initiating the servicing request
- OpenHCL attempts to complete the save operation before the deadline
- If the deadline is exceeded, the host may force termination (timeout enforcement occurs at the host level, not within VTL2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self, I can't get copilot to do what I want. Will adjust this manually. (this just doesn't feel right and I don't know how to describe how I'm going to fix it)


During servicing:
- Private pool pages remain allocated and mapped across the servicing operation
- Shared pool is recreated on demand during the new boot flow (since a new kernel boots, the shared pool allocation is reconstructed as needed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 100% true? I think I led you this way, but really I was thinking about the locked memory pool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to clarify that locked memory allocations (not shared pool) are recreated from VTL2 RAM during new boot. See commit 4741202.


## Storage Backend Types

- **[NVMe](./nvme.md)**: High-performance NVMe storage backend using VFIO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't say "high-performance", since there's nothing in the implementation here that guarantees performance. If this is a device with a low tier, for example, the performance won't be so great. But that's ok, the underlying perf will be within expected SLA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed "high-performance" descriptor. See commit 4741202.

The private pool contains pages that are:
- Mapped with **private visibility** on CVMs
- Hidden from the host on hardware-isolated platforms
- Can be made accessible to VTL0 through permission modifications
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? how does this work? Add details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified that VTL0 permission modifications work on software-isolated platforms (not hardware-isolated). See commit 4741202.

- Hidden from the host on hardware-isolated platforms
- Can be made accessible to VTL0 through permission modifications
- Used for **persistent allocations** that survive save/restore operations
- Critical for NVMe keepalive support during servicing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVMe and MANA keepalive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to mention both NVMe and MANA keepalive. See commit 4741202.


For software-isolated VMs (non-hardware isolated):

- The DMA Manager can modify VTL page permissions via `HvCallModifyVtlProtectionMask`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of implementation detail is inappropriate for a Guide page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed implementation details about HvCallModifyVtlProtectionMask. Simplified to high-level description. See commit 4741202.


The `OPENHCL_IGVM_VTL2_GPA_POOL_CONFIG` parameter controls the VTL2 GPA pool size used for the private pool:

- `debug`: Use debug version of lookup table or device tree
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget about release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "release" option as the default. See commit 4741202.

Copy link
Contributor Author

Copilot AI commented Dec 20, 2025

@copilot some more feedback

All feedback has been addressed in commit 4741202. Key changes:

  • Fixed all 404 rustdoc links by referencing modules instead
  • Added note about command-line parameters
  • Removed performance claims from storage docs
  • Clarified VTL permission details and locked memory behavior
  • Simplified DMA manager implementation details
  • Added release option to config parameters

Copilot AI requested a review from mattkur December 20, 2025 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants