Re: [PATCH v14 0/2] PCI/IOV: Fix SR-IOV locking races and AB-BA deadlock
From: Benjamin Block
Date: Fri May 22 2026 - 12:12:37 EST
On Thu, May 21, 2026 at 03:13:12PM -0500, Bjorn Helgaas wrote:
> On Wed, Apr 22, 2026 at 09:32:40AM +0300, Ionut Nechita (Wind River) wrote:
> > This is v14 of the fix for the SR-IOV race between driver .remove()
> > and concurrent hotplug events.
>
> Can you take a look at the sashiko feedback and see whether there's
> any merit to it?
>
> https://sashiko.dev/#/patchset/cover.1776839248.git.ionut.nechita%40windriver.com
IIRC Ionut already responded to several of the findings a few patch versions
back:
https://lore.kernel.org/linux-pci/20260319202755.16081-1-ionut.nechita%40windriver.com/
Patch 01:
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> [ ... ]
> > @@ -633,15 +629,18 @@ static int sriov_add_vfs(struct pci_dev *dev, u16 num_vfs)
> > if (dev->no_vf_scan)
> > return 0;
> >
> > + pci_lock_rescan_remove();
> Does moving this lock acquisition here create an ABBA deadlock?
> Looking at sriov_numvfs_store(), the kernel acquires device_lock(&pdev->dev)
> before invoking the driver's sriov_configure() callback, which eventually
> reaches this point and calls pci_lock_rescan_remove(). This establishes a
> dependency of device_lock followed by pci_rescan_remove_lock.
> During a PCI hotplug event or sysfs remove_store, the subsystem first acquires
> pci_rescan_remove_lock, then calls pci_stop_and_remove_bus_device() which
> invokes device_release_driver() and takes device_lock(&pdev->dev). This
> creates the reverse dependency.
> Will an SR-IOV sysfs configuration racing with a hot-unplug event result in a
> deadlock? I understand this regression is fixed later in the patch series by
> commit 'PCI: Fix AB-BA deadlock between device_lock and pci_rescan_remove_lock
> in remove_store', but wanted to point out the interaction here.
This is already addressed in Patch 02.
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index b63cd0c310bc0..91f1dae6943b6 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> [ ... ]
> > @@ -3513,16 +3513,30 @@ EXPORT_SYMBOL_GPL(pci_rescan_bus);
> > * routines should always be executed under this mutex.
> > */
> > DEFINE_MUTEX(pci_rescan_remove_lock);
> > +static const struct task_struct *pci_rescan_remove_owner;
> > +static unsigned int pci_rescan_remove_depth;
> >
> > void pci_lock_rescan_remove(void)
> > {
> > - mutex_lock(&pci_rescan_remove_lock);
> > + if (pci_rescan_remove_owner == current) {
>
> Can this read cause a data race?
> This reads pci_rescan_remove_owner outside of the mutex critical section.
> Concurrently, another thread executing this same function can acquire the
> mutex and write to this shared variable.
>
> > + pci_rescan_remove_depth++;
> > + } else {
> > + mutex_lock(&pci_rescan_remove_lock);
> > + pci_rescan_remove_owner = current;
>
> Could the compiler optimize or tear these unprotected accesses? Should these
> reads and writes use READ_ONCE() and WRITE_ONCE() to safely perform the
> lockless accesses and avoid triggering warnings in tools like KCSAN?
>
> > + pci_rescan_remove_depth = 1;
> > + }
> > }
Those two are wrong. `pci_rescan_remove_depth` is only modified when we
already own the mutex `pci_rescan_remove_owner == current`, so the writes are
safe. The reads are safe because, again, the relevant case is when the current
thread already owns the mutex, and then we already have the lock; when we
don't already have the lock we gonna try to take it, but this is also safe
because nobody gonna change the owner to us other than we ourself, and only
once we have taken the lock.
Patch 02:
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -521,8 +521,36 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
> [ ... ]
> > + * By unbinding first, the driver's .remove() callback
> > + * (including any SR-IOV VF cleanup) completes before
> > + * pci_rescan_remove_lock is acquired, ensuring both
> > + * paths take locks in the same order.
> > + */
> > + device_release_driver(dev);
>
> Does calling device_release_driver() here violate the bottom-up teardown
> guarantee if this device is a bridge?
>
> If the device is a PCIe bridge, unbinding the pcieport driver will call
> pci_disable_device(). Because child devices do not increment the parent
> bridge's enable_cnt during probe, the bridge's enable_cnt will drop to zero
> and clear the PCI_COMMAND_MASTER bit.
>
> This disables bus-mastering for the bridge while its child devices are still
> fully active. When they are later unbound during removal, their drivers might
> fail to communicate with the hardware, leading to master aborts, PCIe AER
> errors, and timeouts on DMA flushes.
Ionut previously addressed this in the post I linked above; identifying it as
false-positive.
> > pci_stop_and_remove_bus_device_locked(to_pci_dev(dev));
>
> Does this leave child devices vulnerable to the exact same AB-BA deadlock?
>
> When pci_stop_and_remove_bus_device_locked() executes, it acquires the
> pci_rescan_remove_lock and unbinds child devices recursively:
>
> pci_stop_and_remove_bus_device_locked()
> pci_stop_bus_device()
> pci_stop_dev(child)
> device_release_driver(&child->dev)
>
> This attempts to acquire device_lock(&child->dev) while already holding
> pci_rescan_remove_lock.
>
> If a concurrent unbind_store() happens on the child device, it will acquire
> device_lock(&child->dev) first and then wait for pci_rescan_remove_lock,
> reproducing the original circular dependency.ecause then we can't concurrently change the state to `current`.
This is a false positive; we marked the device as "killed" and unbound the
device driver. So, when the concurrent unbind_store() is called `dev->driver`
is NULL and so it never gonna call device_driver_detach() and take the lock.
Or rather, it wouldn't even be called, because unbind_store() is supposedly
only called when some driver binds the device, which would not be the case
here.
At least thats my take on those latest findings.
--
Best Regards, Benjamin Block / Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH / https://www.ibm.com/privacy
Vors. Aufs.-R.: Wolfgang Wendt / Geschäftsführung: David Faller
Sitz der Ges.: Ehningen / Registergericht: AmtsG Stuttgart, HRB 243294