Re: [PATCH v2 2/3] vfio-pci/zdev: Add VFIO FMB device feature

From: Omar Elghoul

Date: Wed Jun 03 2026 - 08:39:36 EST


On 6/2/26 6:24 PM, Alex Williamson wrote:

On Tue, 19 May 2026 18:42:03 -0400
Omar Elghoul <oelghoul@xxxxxxxxxxxxx> wrote:
diff --git a/drivers/vfio/pci/vfio_pci_zdev.c b/drivers/vfio/pci/vfio_pci_zdev.c
index 0990fdb146b7..1e9efe2bee69 100644
--- a/drivers/vfio/pci/vfio_pci_zdev.c
+++ b/drivers/vfio/pci/vfio_pci_zdev.c
@@ -167,3 +167,80 @@ void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev)
if (zpci_kvm_hook.kvm_unregister)
zpci_kvm_hook.kvm_unregister(zdev);
}
+
+int vfio_pci_zdev_feature_fmb(struct vfio_pci_core_device *vdev, u32 flags,
+ void __user *arg, size_t argsz)
+{
+ struct zpci_dev *zdev;
+ struct vfio_device_feature_zpci_fmb fmb = {0};
+ u32 ops = VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_SET;
Somewhat gratuitous variable usage.
Acked, will remove in v3.

+ int ret;
+
+ ret = vfio_check_feature(flags, argsz, ops, sizeof(fmb));
+ if (ret != 1)
+ return ret;
+
+ zdev = to_zpci(vdev->pdev);
+ if (!zdev)
+ return -ENODEV;
+
+ mutex_lock(&zdev->fmb_lock);
Use a guard and avoid the release_lock gotos.
Acked as well

+ if (flags & VFIO_DEVICE_FEATURE_SET) {
+ if (copy_from_user(&fmb, arg, sizeof(fmb))) {
+ ret = -EFAULT;
+ goto release_lock;
+ }
+
+ if (fmb.flags & VFIO_DEVICE_FEATURE_ZPCI_FMB_FLAGS_ENABLED)
+ ret = zpci_fmb_reenable_device(zdev);
+ else
+ ret = zpci_fmb_disable_device(zdev);
+ goto release_lock;
Remaining flag bits are not tested, breaks any future expanded use of
flags.
Oversight on my part, I will fix this.

Why does the user need to be able to control these?
We want the user (e.g. QEMU) to be able to control these so that when a
guest enables or disables the FMB, this state gets cascaded to the host and
all the way to the firmware.

Doesn't allowing the user to disable FMB remove guaranteed host-based
monitoring?
Yes it does, but this one isn't an oversight and is intentional behavior
to achieve the functionality mentioned above. The host-based monitoring is
not necessarily guaranteed and is treated as a device-specific state, so it
makes sense in the case of passthrough to have that state reflect the state
of the guest that is actually using the device.

Since this is already provided via debugfs, why not make this a
userspace problem to interact with the existing interface?
It might be possible but it would undoubtedly be really ugly and harder to
maintain. I think what we'd dislike most about using debugfs is parsing
text data into the FMB structure. If any of the text representations of the
fields were to change, we would need to update them anywhere that uses them
(e.g., in QEMU or any other user driver). The ABI would be super fragile.

Alternatively, couldn't the existing zpci mediation be extended to
support the guest registering a fmb buffer to be written at regular
intervals (the interface here seems to drop the reporting interval).
The firmware only writes the FMB into one buffer every firmware-specified
interval. If we wanted to write the FMB directly into guest memory, we
would either 1) lose host access to the FMB or 2) have to run a periodic
worker in the kernel to copy the host FMB into the guest-provided buffer
every time the firmware does an update. I don't believe either of these
approaches are favorable.

WRT reporting interval, I intentionally dropped that one as it is already
provided by VFIO_DEVICE_INFO_CAP_ZPCI_GROUP.

+ }
+
+ ret = 0;
+ if (zdev->fmb) {
+ fmb.flags |= VFIO_DEVICE_FEATURE_ZPCI_FMB_FLAGS_ENABLED;
+ } else {
+ fmb.flags &= ~VFIO_DEVICE_FEATURE_ZPCI_FMB_FLAGS_ENABLED;
+ goto release_lock;
+ }
Flag bit is cleared, goto skips copy-to-user, returns success...
I don't know how I missed this one... I'll fix it or remove it entirely by
using the guard like you suggested earlier.

Regardless of what's documented in the header, the buffer should be
assumed to be userspace garbage. Failing to set or clear the entire
flags field precludes any future use.
Acked.

Why do we need to use flags to indicate the enable state? Couldn't we
just as easily have success indicate enabled and -ENOMSG indicate
disabled? Thanks,
Good point, I had not previously considered using -ENOMSG. It makes sense
and would for sure simplify this feature.

Thanks.

Alex