Re: [PATCH v3 2/2] PCI/IOV: Fix race between SR-IOV enable/disable and hotplug
From: Guenter Roeck
Date: Mon Mar 16 2026 - 21:58:13 EST
Hi,
On Tue, Dec 16, 2025 at 11:14:03PM +0100, Niklas Schnelle wrote:
> Commit 05703271c3cd ("PCI/IOV: Add PCI rescan-remove locking when
> enabling/disabling SR-IOV") tried to fix a race between the VF removal
> inside sriov_del_vfs() and concurrent hot unplug by taking the PCI
> rescan/remove lock in sriov_del_vfs(). Similarly the PCI rescan/remove
> lock was also taken in sriov_add_vfs() to protect addition of VFs.
>
> This approach however causes deadlock on trying to remove PFs with
> SR-IOV enabled because PFs disable SR-IOV during removal and this
> removal happens under the PCI rescan/remove lock. So the original fix
> had to be reverted.
>
> Instead of taking the PCI rescan/remove lock in sriov_add_vfs() and
> sriov_del_vfs(), fix the race that occurs with SR-IOV enable and disable
> vs hotplug higher up in the callchain by taking the lock in
> sriov_numvfs_store() before calling into the driver's sriov_configure()
> callback.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: 05703271c3cd ("PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV")
> Reported-by: Benjamin Block <bblock@xxxxxxxxxxxxx>
> Reviewed-by: Benjamin Block <bblock@xxxxxxxxxxxxx>
> Reviewed-by: Gerd Bayer <gbayer@xxxxxxxxxxxxx>
> Signed-off-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>
> ---
> drivers/pci/iov.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 7de5b18647beb69127ba11234fb9f1dec9b50540..4a659c34935e116dd6d0b4ce42ed12a1ba9418d1 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -495,7 +495,9 @@ static ssize_t sriov_numvfs_store(struct device *dev,
>
> if (num_vfs == 0) {
> /* disable VFs */
> + pci_lock_rescan_remove();
> ret = pdev->driver->sriov_configure(pdev, 0);
> + pci_unlock_rescan_remove();
> goto exit;
> }
>
> @@ -507,7 +509,9 @@ static ssize_t sriov_numvfs_store(struct device *dev,
> goto exit;
> }
>
> + pci_lock_rescan_remove();
> ret = pdev->driver->sriov_configure(pdev, num_vfs);
> + pci_unlock_rescan_remove();
> if (ret < 0)
> goto exit;
>
Google's experimental AI review agent provided the following feedback
on this patch.
Could this introduce an AB-BA deadlock between the device lock and the
rescan/remove lock?
Earlier in sriov_numvfs_store(), device_lock(&pdev->dev) is acquired. The
patch then attempts to acquire pci_lock_rescan_remove() while holding the
device lock.
However, during a hotplug removal of the PF (for example, via sysfs),
remove_store() first acquires pci_lock_rescan_remove() and subsequently
calls pci_stop_and_remove_bus_device_locked(). That path eventually calls
device_release_driver(), which attempts to acquire device_lock(&pdev->dev).
If sriov_numvfs_store() and a concurrent removal of the PF race, it appears
they could deadlock waiting on each other's locks.
The actual call sequence (at least in v6.12.y, where this patch was
backported to) is as follows.
remove_store()
-> pci_stop_and_remove_bus_device_locked()
-> pci_lock_rescan_remove()
-> pci_stop_and_remove_bus_device()
-> pci_stop_bus_device()
-> pci_remove_bus_device()
-> pci_remove_bus()
-> device_unregister()
-> device_del()
-> device_lock()
I don't claim to fully understand the code, but the AI does seem to have a
point. Please let me know if the AI analysis is correct or if it misses
something.
Thanks,
Guenter