Re: [PATCH v18 1/4] s390/pci: Store PCI error information for passthrough devices

From: Alex Williamson

Date: Thu Jun 04 2026 - 14:28:18 EST


On Wed, 3 Jun 2026 16:35:11 -0700
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
> On 6/3/2026 3:20 PM, Alex Williamson wrote:
> > On Wed, 3 Jun 2026 11:24:12 -0700
> > Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:
> >> @@ -266,25 +286,19 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
> >> * @pdev: PCI function for which to report
> >> * @es: PCI channel failure state to report
> >> */
> >> -static void zpci_event_io_failure(struct pci_dev *pdev, pci_channel_state_t es)
> >> +static void zpci_event_io_failure(struct pci_dev *pdev, pci_channel_state_t es,
> >> + struct zpci_ccdf_err *ccdf)
> >> {
> >> struct pci_driver *driver;
> >>
> >> pci_dev_lock(pdev);
> >> pdev->error_state = es;
> >> - /**
> >> - * While vfio-pci's error_detected callback notifies user-space QEMU
> >> - * reacts to this by freezing the guest. In an s390 environment PCI
> >> - * errors are rarely fatal so this is overkill. Instead in the future
> >> - * we will inject the error event and let the guest recover the device
> >> - * itself.
> >> - */
> >> - if (is_passed_through(pdev))
> >> - goto out;
> >> +
> >> + zpci_store_pci_error(pdev, ccdf);
> >> driver = to_pci_driver(pdev->dev.driver);
> >> if (driver && driver->err_handler && driver->err_handler->error_detected)
> >> driver->err_handler->error_detected(pdev, pdev->error_state);
> > How do you intend to stage this versus QEMU changes? This seems like a
> > big regression if we're suddenly triggering the eventfd that causes
> > QEMU to halt. Do you need userspace to opt-in to mediated recovery
> > rather than automatically enabling it on open? Thanks,
> >
> > Alex
>
> AFAIU userspace registering an eventfd to receive notification for error
> events is an opt-in? And yes for QEMU the current behavior halts the
> guest, but even today on an error device becomes unusable and requires
> manual intervention. I am not sure if we need to add another opt-in
> mechanism for QEMU.

Yes, QEMU is performing an opt-in, but we're also now calling through
to that opt-in in more cases. Arguably this is coming more in line
with AER handling where I believe only uncorrected errors trigger this
path and we signal through the error eventfd for all uncorrected AER
errors. So long as you've considered the implications for existing
userspace, I won't object. Thanks,

Alex