Re: [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout

From: Michal Pecio

Date: Fri May 22 2026 - 20:45:20 EST


On Fri, 22 May 2026 17:45:22 -0300, Desnes Nunes wrote:
> Hello Michal,
>
> On Fri, May 22, 2026 at 6:03 AM Michal Pecio <michal.pecio@xxxxxxxxx> wrote:
> > If the bug is deterministic it should be fairly easy to nail it down.
> > Attached xhci debugfs patch adds a list of almost all memory the xHC
> > is allowed to access, including (if I havevn't missed something) all
> > mappings it is allowed to access before any slots are enabled.
> >
> > Please apply, reboot and then:
> >
> > zip -r before.zip /sys/kernel/debug/usb/xhci/0000:80:14.0
> > # trigger crash kexec and the bug
> > zip -r after.zip /sys/kernel/debug/usb/xhci/0000:80:14.0
> ...
> > And since the bug may be an out of bounds access by the HW, if you
> > don't mind running slightly experimental patch to a critical subsystem,
> > please also apply the DMA guard pages patch. I've been using it for a
> > few months without issues, but YMMV. It helps determine which mapping
> > is accessed OOB.
> >
> > Note: DMA guard pages may casue USB to stop working before kexec if
> > it's a HW bug masked by memory layout, or begin to work after kexec in
> > case of some IOMMU subsystem issues.
>
> Please find the requested information in the attachments below.
> Kernel was patched with both patches.

Sorry, I forgot about the most important thing: crash kernel log, or at
least the IOMMU fault message showing the bad address.

However, in this case it's moot because it seems that the HC didn't
fault after kexec and it worked normally - debugfs shows that USBSTS=0
and some device has been successfully enabled.

(It's a little odd that CRCR.CRR appears to be clear, not sure what's
the deal with that, but it's same thing in before.zip as well.)

So either the bug isn't as deterministic as we thought, or one of the
patches "fixed" it, and that could only be DMA Guard Pages. If you
can't reproduce the problem with guard pages, please remove that patch
and post before/after debugfs again, plus crash kernel dmesg.

Regards,
Michal