Skip to content

Conversation

@scholzp
Copy link

@scholzp scholzp commented Nov 21, 2025

Configurable BDF Design

Draft, because first the CHV side must b merged!

Background

tl;dr: Same motivation as in cyberus-technology/cloud-hypervisor#32

Configurable PCI BDF (Bus, Device, Function) allows you to choose the address a guest sees for a PCI device. Qemu, for example, implements the addr option for the device argument. The addr argument allows defining the device and function parts of a device's BDF in the form addr=DD.F, where D denotes a device ID in the range [0..32] and F a function ID in the range [0..7]. Qemu denotes the bus number with the option busnr, where a valid expression is busnr=pci.2 to denote the second PCI bus, for example. [0]

An example of a BDF for Bus=1, Device=3, and Function=5 is 1:03.5. [1]

Current status

Currently, the CHV driver generates BDFs on the fly during hotplug of net devices. This is unfortunate, as the PCI slot IDs depend on the order in which network devices are hotplugged. Disk devices are currently not considered for BDF generation. This leaves not only the possibility that device BDF may change when restarting/migrating VMs, but also fails to account for synchronization of BDFs between CHV and libvirt.

Proposed changes

We utilize the given libvirt infrastructure to assign a BDF to a device whenever we add one. Moreover, with the right patches merged to CHV, we now transmit the correct BDFs when communicating with CHV. This solves both problems mentioned before: First, BDFs are in sync between CHV and libvirt by similarly issuing BDF whenever necessary. Second, once the BDFs are assigned by libvirt, both sides respect them when initializing devices, creating save addresses for use cases such as live migration.

References

[0] https://qemu-project.gitlab.io/qemu/system/device-emulation.html#device-buses

[1] https://wiki.xenproject.org/wiki/Bus:Device.Function_(BDF)_Notation

This commit introduces infrastructure for automatically assigning
PCI BDF to libvirt devices. We assign a fixed BDF to all devices
to guarantee that a guest can find devices on the same bus address
after live migration and other use cases.

Signed-off-by: Pascal Scholz <[email protected]>
On-behalf-of: SAP [email protected]
We need a way to store PCI BDFs for all buses of a domain.

Signed-off-by: Pascal Scholz <[email protected]>
On-behalf-of: SAP [email protected]
We need to mark BDFs for all devices as reserved whenever we create a
domain.

Signed-off-by: Pascal Scholz <[email protected]>
On-behalf-of: SAP [email protected]
When creating device JSON for transmission, we need to use
the BDFs assigned by internal libvirt logic to stay in sync.

Signed-off-by: Pascal Scholz <[email protected]>
On-behalf-of: SAP [email protected]
If we hotplug a device, we need to assign an static PCI BDF to it,
so we can ensure that changes to the PCI device tree are reflected
properly in libvirt. Same for detaching devices.

Signed-off-by: Pascal Scholz <[email protected]>
On-behalf-of: SAP [email protected]
@scholzp scholzp marked this pull request as draft November 21, 2025 15:59
@scholzp scholzp closed this Nov 24, 2025
cyberus-renovate pushed a commit that referenced this pull request Nov 25, 2025
Variable 'master' needs to be free because it will be reassigned in
virNetDevOpenvswitchInterfaceGetMaster().

The leaked stack:
Direct leak of 11 byte(s) in 1 object(s) allocated from:
    #0 0x7f7dad8ba6df in __interceptor_malloc (/lib64/libasan.so.8+0xba6df)
    #1 0x7f7dad715728 in g_malloc (/lib64/libglib-2.0.so.0+0x60728)
    #2 0x7f7dad72d8b2 in g_strdup (/lib64/libglib-2.0.so.0+0x788b2)
    #3 0x7f7dacb63088 in g_strdup_inline /usr/include/glib-2.0/glib/gstrfuncs.h:321
    #4 0x7f7dacb63088 in virNetDevGetName ../src/util/virnetdev.c:823
    #5 0x7f7dacb63886 in virNetDevGetMaster ../src/util/virnetdev.c:909
    #6 0x7f7dacb90288 in virNetDevTapReattachBridge ../src/util/virnetdevtap.c:527
    #7 0x7f7dacd5cd67 in virDomainNetNotifyActualDevice ../src/conf/domain_conf.c:30505
    #8 0x7f7da3a10bc3 in qemuProcessNotifyNets ../src/qemu/qemu_process.c:3290
    #9 0x7f7da3a375c6 in qemuProcessReconnect ../src/qemu/qemu_process.c:9211
    #10 0x7f7dacc0cc53 in virThreadHelper ../src/util/virthread.c:256
    #11 0x7f7dac2875d4 in start_thread (/lib64/libc.so.6+0x875d4)
    #12 0x7f7dac3091bb in __GI___clone3 (/lib64/libc.so.6+0x1091bb)

Fixes: de938b9
Signed-off-by: QiangWei Zhang <[email protected]>
Reviewed-by: Peter Krempa <[email protected]>
cyberus-renovate pushed a commit that referenced this pull request Nov 25, 2025
…ereference it

accel->cpuModels field might be NULL if QEMU does not return CPU models.
The following backtrace is observed in such cases:
0  virQEMUCapsProbeQMPCPUDefinitions (qemuCaps=qemuCaps@entry=0x7f1890003ae0, accel=accel@entry=0x7f1890003c10, mon=mon@entry=0x7f1890005270)
   at ../src/qemu/qemu_capabilities.c:3091
1  0x00007f18b42fa7b1 in virQEMUCapsInitQMPMonitor (qemuCaps=qemuCaps@entry=0x7f1890003ae0, mon=0x7f1890005270) at ../src/qemu/qemu_capabilities.c:5746
2  0x00007f18b42fafaf in virQEMUCapsInitQMPSingle (qemuCaps=qemuCaps@entry=0x7f1890003ae0, libDir=libDir@entry=0x7f186c1e70f0 "/var/lib/libvirt/qemu",
   runUid=runUid@entry=955, runGid=runGid@entry=955, onlyTCG=onlyTCG@entry=false) at ../src/qemu/qemu_capabilities.c:5832
3  0x00007f18b42fb1a5 in virQEMUCapsInitQMP (qemuCaps=0x7f1890003ae0, libDir=0x7f186c1e70f0 "/var/lib/libvirt/qemu", runUid=955, runGid=955)
   at ../src/qemu/qemu_capabilities.c:5848
4  virQEMUCapsNewForBinaryInternal (hostArch=VIR_ARCH_X86_64, binary=binary@entry=0x7f1868002fc0 "/usr/bin/qemu-system-alpha",
   libDir=0x7f186c1e70f0 "/var/lib/libvirt/qemu", runUid=955, runGid=955,
   hostCPUSignature=0x7f186c1e9f20 "AuthenticAMD, AMD Ryzen 9 7950X 16-Core Processor, family: 25, model: 97, stepping: 2", microcodeVersion=174068233,
   kernelVersion=0x7f186c194200 "6.14.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 29 May 2025 21:42:15 +0000", cpuData=0x7f186c1ea490)
   at ../src/qemu/qemu_capabilities.c:5907
5  0x00007f18b42fb4c9 in virQEMUCapsNewData (binary=0x7f1868002fc0 "/usr/bin/qemu-system-alpha", privData=0x7f186c194280)
   at ../src/qemu/qemu_capabilities.c:5942
6  0x00007f18bd42d302 in virFileCacheNewData (cache=0x7f186c193730, name=0x7f1868002fc0 "/usr/bin/qemu-system-alpha") at ../src/util/virfilecache.c:206
7  virFileCacheValidate (cache=cache@entry=0x7f186c193730, name=name@entry=0x7f1868002fc0 "/usr/bin/qemu-system-alpha", data=data@entry=0x7f18b67c37c0)
   at ../src/util/virfilecache.c:269
8  0x00007f18bd42d5b8 in virFileCacheLookup (cache=cache@entry=0x7f186c193730, name=name@entry=0x7f1868002fc0 "/usr/bin/qemu-system-alpha")
   at ../src/util/virfilecache.c:301
9  0x00007f18b42fb679 in virQEMUCapsCacheLookup (cache=cache@entry=0x7f186c193730, binary=binary@entry=0x7f1868002fc0 "/usr/bin/qemu-system-alpha")
   at ../src/qemu/qemu_capabilities.c:6036
10 0x00007f18b42fb785 in virQEMUCapsInitGuest (caps=<optimized out>, cache=<optimized out>, hostarch=VIR_ARCH_X86_64, guestarch=VIR_ARCH_ALPHA)
   at ../src/qemu/qemu_capabilities.c:1037
11 virQEMUCapsInit (cache=0x7f186c193730) at ../src/qemu/qemu_capabilities.c:1229
12 0x00007f18b431d311 in virQEMUDriverCreateCapabilities (driver=driver@entry=0x7f186c01f410) at ../src/qemu/qemu_conf.c:1553
13 0x00007f18b431d663 in virQEMUDriverGetCapabilities (driver=0x7f186c01f410, refresh=<optimized out>) at ../src/qemu/qemu_conf.c:1623
14 0x00007f18b435e3e4 in qemuConnectGetVersion (conn=<optimized out>, version=0x7f18b67c39b0) at ../src/qemu/qemu_driver.c:1492
15 0x00007f18bd69c5e8 in virConnectGetVersion (conn=0x55bc5f4cda20, hvVer=hvVer@entry=0x7f18b67c39b0) at ../src/libvirt-host.c:201
16 0x000055bc34ef3627 in remoteDispatchConnectGetVersion (server=0x55bc5f4b93f0, msg=0x55bc5f4cdf60, client=0x55bc5f4c66d0, rerr=0x7f18b67c3a80,
   ret=0x55bc5f4b8670) at src/remote/remote_daemon_dispatch_stubs.h:1265
17 remoteDispatchConnectGetVersionHelper (server=0x55bc5f4b93f0, client=0x55bc5f4c66d0, msg=0x55bc5f4cdf60, rerr=0x7f18b67c3a80, args=0x0, ret=0x55bc5f4b8670)
   at src/remote/remote_daemon_dispatch_stubs.h:1247
18 0x00007f18bd5506da in virNetServerProgramDispatchCall (prog=0x55bc5f4cae90, server=0x55bc5f4b93f0, client=0x55bc5f4c66d0, msg=0x55bc5f4cdf60)
   at ../src/rpc/virnetserverprogram.c:423
19 virNetServerProgramDispatch (prog=0x55bc5f4cae90, server=server@entry=0x55bc5f4b93f0, client=0x55bc5f4c66d0, msg=0x55bc5f4cdf60)
   at ../src/rpc/virnetserverprogram.c:299
20 0x00007f18bd556c32 in virNetServerProcessMsg (srv=srv@entry=0x55bc5f4b93f0, client=<optimized out>, prog=<optimized out>, msg=<optimized out>)
   at ../src/rpc/virnetserver.c:135
21 0x00007f18bd556f77 in virNetServerHandleJob (jobOpaque=0x55bc5f4d2bb0, opaque=0x55bc5f4b93f0) at ../src/rpc/virnetserver.c:155
22 0x00007f18bd47dd19 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:164
23 0x00007f18bd47d253 in virThreadHelper (data=0x55bc5f4b7810) at ../src/util/virthread.c:256
24 0x00007f18bce117eb in start_thread (arg=<optimized out>) at pthread_create.c:448
25 0x00007f18bce9518c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Signed-off-by: anonymix007 <[email protected]>
cyberus-renovate pushed a commit that referenced this pull request Nov 25, 2025
…returning

Successful return from 'virConnectDomainEventDeregisterAny' does not
guarantee that there aren't still in-progress events being handled by
the callbacks. Since 'cmdEvent' passes in a slice from an array as the
private data of the callbacks, we must ensure that the array stays in
scope (it's auto-freed) for the whole time there are possible callbacks
being executed.

While in practice this doesn't happen as the callbacks are usually quick
enough to finish while unregistering stuff, placing a 'sleep(1)' into
e.g. 'virshEventLifecyclePrint' and starting a domain results in crash
of virsh with the following backtrace:

 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x00005557b5cfd343 in virshEventPrintf (data=data@entry=0x5557db9619b0, fmt=fmt@entry=0x5557b5d5e527 "%s") at ../../../libvirt/tools/virsh-domain-event.c:252

 Thread 2 (Thread 0x7f59a54b7d00 (LWP 2097121)):
 #0  0x00007f59a6cadbf9 in __futex_abstimed_wait_common () at /lib64/libc.so.6
 #1  0x00007f59a6cb2cf3 in __pthread_clockjoin_ex () at /lib64/libc.so.6
 #2  0x00005557b5cd57f6 in virshDeinit (ctl=0x7ffc7b615140) at ../../../libvirt/tools/virsh.c:408
 #3  0x00005557b5cd5391 in main (argc=<optimized out>, argv=<optimized out>) at ../../../libvirt/tools/virsh.c:932

 Thread 1 (Thread 0x7f59a51a66c0 (LWP 2097122)):
 #0  0x00005557b5cfd343 in virshEventPrintf (data=data@entry=0x5557db9619b0, fmt=fmt@entry=0x5557b5d5e527 "%s") at ../../../libvirt/tools/virsh-domain-event.c:252
 #1  0x00005557b5cffa10 in virshEventPrint (data=0x5557db9619b0, buf=0x7f59a51a55c0) at ../../../libvirt/tools/virsh-domain-event.c:290
 #2  virshEventLifecyclePrint (conn=<optimized out>, dom=<optimized out>, event=<optimized out>, detail=<optimized out>, opaque=0x5557db9619b0) at ../../../libvirt/
 [snipped]

From the backtrace you can see that the 'main()' thread is already
shutting down virsh, which means that 'cmdEvent' terminated and the
private data was freed. The event loop thread is still execing the
callback which accesses the data.

To fix this add a condition and wait on all of the callbacks to be
unregistered first (their private data freeing function will be called).

This bug was observed when I've copied the event code for a new virsh
command which had a bit more involved callbacks.

Fixes: 99fa96c
Signed-off-by: Peter Krempa <[email protected]>
Reviewed-by: Ján Tomko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant