Re: [PATCH v4 2/3] PCI: Add soft reset method as last resort

From: Alex Williamson

Date: Mon May 18 2026 - 13:16:46 EST


On Mon, 18 May 2026 14:48:34 +0200
Jose Ignacio Tornos Martinez <jtornosm@xxxxxxxxxx> wrote:

> Add a software-initiated "soft" reset method that attempts D3hot->D0
> transition as an absolute last resort when all other reset methods
> have failed.
>
> Some devices incorrectly advertise NoSoftRst+ (blocking PM reset) but
> the D3hot transition does provide sufficient reset for certain use cases,
> particularly VFIO passthrough scenarios. This method provides a "better
> than nothing" option when the device would otherwise have no reset
> capability.
>
> The method only becomes available when:
> - pci_pm_reset() is unavailable (typically blocked by NoSoftRst+)
> - pci_d3cold_reset() is unavailable (no platform _PR3 support)
> - Device has PM capability (required for D3hot transition)
>
> Extract the D3hot transition logic into a shared helper function
> (pci_do_d3hot_transition) used by both pci_pm_reset and pci_soft_reset.
>
> Reset hierarchy with this change:
> 1. device_specific
> 2. acpi
> 3. flr
> 4. af_flr
> 5. pm (proper method, checks NoSoftRst)
> 6. bus
> 7. cxl_bus
> 8. d3cold (requires _PR3)
> 9. soft (NEW - D3hot without NoSoftRst check, absolute last resort)
>
> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@xxxxxxxxxx>
> ---
> v4: Implements D3hot transition as last resort when pm/d3cold unavailable
> v3: https://lore.kernel.org/all/20260513122349.268753-1-jtornosm@xxxxxxxxxx/
>
> drivers/pci/pci.c | 98 ++++++++++++++++++++++++++++++++++-----------
> include/linux/pci.h | 2 +-
> 2 files changed, 76 insertions(+), 24 deletions(-)

NAK. This cannot happen as a general case, it will cause vfio-pci to
report reset capabilities for essentially all devices, whether
validated or not.

The suggestion was that for devices where this has proven "better than
nothing", we could think about a device specific version of this,
matching devices IDs, not a fall-through for any device.

Given the "partial reset" nature of this, even on the target device, I
still wonder though whether userspace cannot already handle this by
forcing the power state through sysfs prior to assigning. Thanks,

Alex

> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 839903b59698..8dad386bd65d 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4437,6 +4437,43 @@ static int pci_af_flr(struct pci_dev *dev, bool probe)
> return ret;
> }
>
> +/**
> + * pci_do_d3hot_transition - Perform D3hot->D0 power state transition
> + * @dev: Device to transition
> + *
> + * Common helper to perform D3hot->D0 transition for PM-based reset methods.
> + * Handles IOMMU preparation, state transition, and waiting for device ready.
> + */
> +static int pci_do_d3hot_transition(struct pci_dev *dev)
> +{
> + u16 csr;
> + int ret;
> +
> + if (dev->current_state != PCI_D0)
> + return -EINVAL;
> +
> + ret = pci_dev_reset_iommu_prepare(dev);
> + if (ret) {
> + pci_err(dev, "failed to stop IOMMU for a PCI reset: %d\n", ret);
> + return ret;
> + }
> +
> + pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &csr);
> + csr &= ~PCI_PM_CTRL_STATE_MASK;
> + csr |= PCI_D3hot;
> + pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, csr);
> + pci_dev_d3_sleep(dev);
> +
> + csr &= ~PCI_PM_CTRL_STATE_MASK;
> + csr |= PCI_D0;
> + pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, csr);
> + pci_dev_d3_sleep(dev);
> +
> + ret = pci_dev_wait(dev, "PM D3hot->D0", PCIE_RESET_READY_POLL_MS);
> + pci_dev_reset_iommu_done(dev);
> + return ret;
> +}
> +
> /**
> * pci_pm_reset - Put device into PCI_D3 and back into PCI_D0.
> * @dev: Device to reset.
> @@ -4455,7 +4492,6 @@ static int pci_af_flr(struct pci_dev *dev, bool probe)
> static int pci_pm_reset(struct pci_dev *dev, bool probe)
> {
> u16 csr;
> - int ret;
>
> if (!dev->pm_cap || dev->dev_flags & PCI_DEV_FLAGS_NO_PM_RESET)
> return -ENOTTY;
> @@ -4467,28 +4503,7 @@ static int pci_pm_reset(struct pci_dev *dev, bool probe)
> if (probe)
> return 0;
>
> - if (dev->current_state != PCI_D0)
> - return -EINVAL;
> -
> - ret = pci_dev_reset_iommu_prepare(dev);
> - if (ret) {
> - pci_err(dev, "failed to stop IOMMU for a PCI reset: %d\n", ret);
> - return ret;
> - }
> -
> - csr &= ~PCI_PM_CTRL_STATE_MASK;
> - csr |= PCI_D3hot;
> - pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, csr);
> - pci_dev_d3_sleep(dev);
> -
> - csr &= ~PCI_PM_CTRL_STATE_MASK;
> - csr |= PCI_D0;
> - pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, csr);
> - pci_dev_d3_sleep(dev);
> -
> - ret = pci_dev_wait(dev, "PM D3hot->D0", PCIE_RESET_READY_POLL_MS);
> - pci_dev_reset_iommu_done(dev);
> - return ret;
> + return pci_do_d3hot_transition(dev);
> }
>
> /**
> @@ -4530,6 +4545,42 @@ static int pci_d3cold_reset(struct pci_dev *dev, bool probe)
> return pci_set_power_state(dev, PCI_D0);
> }
>
> +/**
> + * pci_soft_reset - Software-initiated reset via D3hot as last resort
> + * @dev: PCI device to reset
> + * @probe: if true, check if soft reset is supported; if false, perform reset
> + *
> + * Attempt a software-initiated reset via D3hot->D0 transition as an absolute
> + * last resort when all other reset methods have failed. This method only
> + * becomes available if the device has PM capability, pci_pm_reset() is blocked
> + * (typically by NoSoftRst+), and pci_d3cold_reset() is not available.
> + *
> + * Some devices incorrectly advertise NoSoftRst+ but D3hot transition does
> + * provide sufficient reset for certain use cases (e.g., VFIO passthrough).
> + * This method provides a "better than nothing" option when the device would
> + * otherwise have no reset capability.
> + *
> + * Returns 0 if device can be/was reset this way, -ENOTTY if a better reset
> + * method is available (pm or d3cold) or device lacks PM capability, or other
> + * negative error code on failure.
> + */
> +static int pci_soft_reset(struct pci_dev *dev, bool probe)
> +{
> + if (pci_pm_reset(dev, true) == 0)
> + return -ENOTTY;
> +
> + if (pci_d3cold_reset(dev, true) == 0)
> + return -ENOTTY;
> +
> + if (!dev->pm_cap)
> + return -ENOTTY;
> +
> + if (probe)
> + return 0;
> +
> + return pci_do_d3hot_transition(dev);
> +}
> +
> /**
> * pcie_wait_for_link_status - Wait for link status change
> * @pdev: Device whose link to wait for.
> @@ -5105,6 +5156,7 @@ const struct pci_reset_fn_method pci_reset_fn_methods[] = {
> { pci_reset_bus_function, .name = "bus" },
> { cxl_reset_bus_function, .name = "cxl_bus" },
> { pci_d3cold_reset, .name = "d3cold" },
> + { pci_soft_reset, .name = "soft" },
> };
>
> /**
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 1ca7b880ead7..bcd2987b868b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -51,7 +51,7 @@
> PCI_STATUS_PARITY)
>
> /* Number of reset methods used in pci_reset_fn_methods array in pci.c */
> -#define PCI_NUM_RESET_METHODS 9
> +#define PCI_NUM_RESET_METHODS 10
>
> #define PCI_RESET_PROBE true
> #define PCI_RESET_DO_RESET false