Skip to content

Conversation

@ricardon
Copy link
Contributor

Hi,

This pull request implements microsoft/openvmm#406. The issue describes the problem statement and proposed solution.

This code enables the changes made here: microsoft/openvmm#384.

Thank you for your consideration.

@ricardon
Copy link
Contributor Author

ricardon commented May 22, 2025 via email

@ricardon
Copy link
Contributor Author

@microsoft-github-policy-service agree [company="Intel Corporation"]

@ricardon
Copy link
Contributor Author

@microsoft-github-policy-service agree company="Intel Corporation"

/* For x86-64 TDX only */
#define MSHV_VTL_TDCALL _IOWR(MSHV_IOCTL, 0x32, struct mshv_tdcall)
#define MSHV_VTL_READ_VMX_CR4_FIXED1 _IOR(MSHV_IOCTL, 0x33, __u64)
#define MSHV_VTL_MAP_REDIRECTED_DEVICE_INTERRUPT _IOWR(MSHV_IOCTL, 0x38, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was taken recently by the KICK_CPUS ioctl. If you've rebased, please change to a free code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I didn't realize. I will update it.

* vector is mapped.
*/
intr_data.vector = ret;
ret = copy_to_user(user_arg, &intr_data, sizeof(intr_data));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might fail returning the number of bytes that weren't copied. Is that good to return from this ioctl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. This does not look correct. I'll fix the error handling.

return (long)-EFAULT;

if (!intr_data.create_mapping)
/* User space provides the hardware vector to unmap. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move the comment above the if statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will do.


static void do_assert_single_proxy_intr(const u32 vector, struct mshv_vtl_run *run)
{
const u32 bank = vector / 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compiler should optimize this, but maybe >>5 would be good here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. This is a good improvement. I will update it.

static void do_assert_single_proxy_intr(const u32 vector, struct mshv_vtl_run *run)
{
const u32 bank = vector / 32;
const u32 masked_irr = BIT(vector % 32) & ~READ_ONCE(run->proxy_irr_blocked[bank]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

& 0x1f instead of the modulo? Thinkling if the compiler doesn't optimize, that's the hot path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Good suggestion. I will add it.

@ricardon ricardon force-pushed the rneri/redirec-posted-intr branch 2 times, most recently from 76e36f5 to 9fe424d Compare May 31, 2025 00:46
@ricardon
Copy link
Contributor Author

I updated the patches to incorporate feedback from @romank-msft and use a seqcount to serialize access to the redirected proxy interrupts data structures.

@romank-msft
Copy link
Contributor

LGTM. Someone more familiar with the intent might want to take a look.

The computation of the proxy interrupt number needs to be performed each
time is asserted. Optimize the computation using bit shifts and masks.

While here, fix a minor issue on a comment.

Suggested-by: Roman Kisel <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v2:
 * None

Changes since v1:
 * Introduced this patch.
mshv_tdx_set_cpumask_from_apicid() does two things: it traverses the APICID
hashtable and sets a CPU mask. Traversing the hastable can be carved out
into a helper function. This makes the code more modular and other
functions can benefit from such helper. Subsequent patches will make use
of it.

Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v2:
 * Introduced this patch.

Changes since v1:
 * N/A
The statically assigned interrupt vector of the VMBus synthetic interrupt
is used to also assert proxy interrupts. These can be asserted in group or
individually. Proxy interrupts can also be asserted individually if a
redirection is in place to use a dedicated Linux IRQ.

Add a helper function to assert individual proxy interrupts.

No functional change intended.

Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v2:
 * None

Changes since v1:
 * Rebased on the previous changeset that optimizes the calculation of
   the proxy interrupt bit.
The proxy interrupts that the VTL2 kernel relays to the VTL0 guest are
handled via the single interrupt vector of the VMBus synthetic interrupt
controller.

On TDX guests in particular, Hyper-V does not use IOMMU posted interrupts
for these interrupts. This results in a TDEXIT for every interrupt and
consequent performance degradation.

To recoup performance, newer versions of Hyper-V feature support the
posting of VTL0 interrupts directly to the VTL2 guest. In such setup, the
VTL2 guest still relays interrupts to the VTL0 guest, but uses a dedicated
VTL2 interrupt vector for each VTL0 interrupt. The dedicated vector must be
reserved only in one CPU that the user space specifies. Hyper-V shall post
the interrupt only to the designated VTL2 vCPU. The VTL2 kernel must not
migrate the dedicated vector to another VTL2 vCPU.

User space in the VTL2 guest is responsible of requesting such redirection.
Implement the new MAP_REDIRECTED_DEVICE_INTERRUPT IOCTL. User space
provides the vector number of the VTL0 guest that wants the VTL2 kernel to
redirect as well as the APIC ID of the VTL2 vCPU on which the interrupt
redirection will take place. On success, return in the payload of the IOCTL
the VTL2 CPU vector number reserved for this purpose. This is the vector
number that Hyper-V must use when setting up the posted interrupt.

Reuse the same IOCTL to also request the removal of the redirection when
needed. When removing a redirection user space must provide the number of
the VTL2 interrupt vector number that the kernel provided when the
redirection was established.

Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v2:
 * Modified the payload of the IOCTL to also include the APIC ID of the
   VTL2 CPU on which the interrupt redirection will take place.
 * Removed weak stubs that are no longer needed.

Changes since v1:
 * Assigned 0x39 to MAP_REDIRECTED_DEVICE_INTERRUPT as 0x38 was already
   taken. (Roman)
 * Fixed error handling for copy_to_user(). (Roman)
 * Moved comment about unmapping above the conditional statement. (Roman)
Add a thin irq_domain for redirected interrupts. The kernel IRQ
infrastructure will use this domain to manage the resources and
configuration of Linux IRQs to meet the requirements of redirected
proxy interrupts.

The redirected_intr_domain irq_domain defines the .alloc() operation to
request the allocation of IRQs in the IRQ hierarchy all the way up to
x86_vector_domain. Use dummy_irq_chip as the irq_chip for the domain since
no other hardware configuration is needed. Importantly, this irq_chip does
not define the .irq_set_affinity() operation. Redirected proxy interrupts
do not allow changes in affinity.

Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v2:
 * Introduced this patch.

Changes since v1:
 * N/A
Implement the setup portion of the MAP_REDIRECTED_DEVICE_INTERRUPT
IOCTL. To redirect a proxy interrupt, allocate and request a Linux IRQ.

Set the CPU affinity of the IRQ to that user space specified. Also, specify
to request_irq() that the interrupt should be excluded from interrupt
balancing as such action would change the CPU affinity of the interrupt.
The contract with user space restricts the kernel from changing it.

Upon getting an interrupt, the handler will assert the proxy interrupt
vector that user space specified.

Remember which hardware vector (i.e., the CPU vector number) that was
assigned to the IRQ. This information is needed to tear down the
redirection of the proxy interrupt (it will be implemented in a subsequent
changset).

Keep a list of the redirected interrupts. Use it to avoid redirecting
the same proxy interrupt more than once.

Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v2:
 * Introduced this patch.

Changes since v1:
 * N/A
User space requests the tear down of a redirected interrupt using the
APIC ID and the CPU interrupt vector number associated with it.

Traverse the list of redirected interrupt and release the resources of
the IRQ being teared down.

Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v2:
 * Introduced this patch.

Changes since v1:
 * N/A
@ricardon ricardon force-pushed the rneri/redirec-posted-intr branch from 9fe424d to 9f0c09e Compare October 31, 2025 19:53
@ricardon
Copy link
Contributor Author

I have updated this pull request to not touch the arch/x86 directory and instead use the existing Linux IRQ infrastructure.

@ricardon
Copy link
Contributor Author

ricardon commented Nov 6, 2025

@romank-msft @chris-oo would you mind taking a look at these changes when you get a chance? These are the kernel changes that microsoft/openvmm#384 from @balajimc55

Copy link
Member

@chris-oo chris-oo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly looks good, just had a question on when we should enable this functionality

@namancse namancse self-requested a review November 13, 2025 16:09
@namancse
Copy link
Contributor

Changes look good to me. They are well separated into different commits, and the commit messages make it easy to understand the purpose and implementation.

Copy link
Contributor

@namancse namancse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@ricardon
Copy link
Contributor Author

Looks good to me.

Thank you @namancse !

@hargar19
Copy link
Collaborator

@ricardon Please merge this PR when you're ready

@ricardon ricardon closed this Nov 15, 2025
@ricardon ricardon reopened this Nov 15, 2025
@ricardon
Copy link
Contributor Author

@hargar19 I don't think I have permissions to merge PRs.

@hargar19 hargar19 merged commit 4e114da into microsoft:product/hcl-main/6.12 Nov 17, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants