Re: [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout

From: Desnes Nunes

Date: Wed May 20 2026 - 01:00:16 EST


Hello Michal,

On Mon, May 18, 2026 at 3:33 AM Michal Pecio <michal.pecio@xxxxxxxxx> wrote:
> > The chip IOMMU faults shortly after setting USBCMD.RUN = 1.
> > Such fault is expected to cause HSE assertion and usually it does.
> > You will probably find that HSE is already set while Enable Slot
> > is being queued, even if it was clear in xhci_gen_setup().

I've just read HSE at these places and confirmed that HSE was already
set even before queuing the enable slot trb, even though it was
previously clear in xhci_gen_setup().

Also, now digging more into the IOMMU debug messages, I have found out
that IOMMU also faults a Write of the network driver, prior to the
xhci Read fault:

=========
# lspci | grep "80:1f.6\|80:14.0"
80:14.0 USB controller: Intel Corporation 800 Series PCH USB 3.1 xHCI
HC (rev 10)
80:1f.6 Ethernet controller: Intel Corporation Ethernet Connection
(19) I219-LM (rev 10)
=========
...
[Tue May 19 10:06:31 2026] PCI host bridge to bus 0000:80
[Tue May 19 10:06:31 2026] pci_bus 0000:80: root bus resource [io
0x2000-0x8bff window]
[Tue May 19 10:06:31 2026] pci_bus 0000:80: root bus resource [mem
0xb8000000-0xbdffffff window]
[Tue May 19 10:06:31 2026] pci_bus 0000:80: root bus resource [mem
0x8000000000-0x9fdfffffff window]
[Tue May 19 10:06:31 2026] pci_bus 0000:80: root bus resource [bus 80-df]
[Tue May 19 10:06:31 2026] pci 0000:80:14.0: [8086:7f6e] type 00
class 0x0c0330 conventional PCI endpoint
[Tue May 19 10:06:31 2026] pci 0000:80:14.0: BAR 0 [mem
0x8000200000-0x800020ffff 64bit]
[Tue May 19 10:06:31 2026] pci 0000:80:14.0: PME# supported from D3hot D3cold
...
[Tue May 19 10:06:31 2026] pci 0000:80:1f.6: [8086:550c] type 00
class 0x020000 conventional PCI endpoint
[Tue May 19 10:06:31 2026] pci 0000:80:1f.6: BAR 0 [mem
0xb8100000-0xb811ffff]
[Tue May 19 10:06:31 2026] pci 0000:80:1f.6: PME# supported from D0
D3hot D3cold
...
[Tue May 19 10:06:32 2026] pci 0000:80:14.0: Adding to iommu group 20
...
[Tue May 19 10:06:32 2026] pci 0000:80:1f.6: Adding to iommu group 29
[Tue May 19 10:06:32 2026] pci 0000:81:00.0: Adding to iommu group 30
[Tue May 19 10:06:32 2026] DMAR: Intel(R) Virtualization Technology
for Directed I/O
[Tue May 19 10:06:32 2026] PCI-DMA: Using software bounce buffering
for IO (SWIOTLB)
[Tue May 19 10:06:32 2026] software IO TLB: mapped [mem
0x000000003b000000-0x000000003f000000] (64MB)
[Tue May 19 10:06:32 2026] ACPI: bus type thunderbolt registered
[Tue May 19 10:06:32 2026] DMAR: DRHD: handling fault status reg 3
=> [Tue May 19 10:06:32 2026] DMAR: [DMA Write NO_PASID] Request
device [80:1f.6] fault addr 0x115106000 [fault reason 0x39] SM:
Present bit in Root Entry is clear
...
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: xHCI Host Controller
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: new USB bus
registered, assigned bus number 3
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: // Halt the HC
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Resetting HCD
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: // Reset the HC
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Wait for
controller to be ready for doorbell rings
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Reset complete
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Enabling 64-bit
DMA addresses.
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: HCD page size set to 4K
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Starting xhci_mem_init
=> [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Device context
base array address = 0x000000107513c000 (DMA), 000000002c3aab07 (virt)
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Allocated command
ring at 00000000ee0da32e
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Allocating
primary event ring
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Allocating 34
scratchpad buffers
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Ext Cap
00000000a35d82fb, port offset = 1, count = 14, revision = 0x2
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:1 PSIE:2
PLT:0 PFD:0 LP:0 PSIM:12
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:2 PSIE:1
PLT:0 PFD:0 LP:0 PSIM:1500
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:3 PSIE:2
PLT:0 PFD:0 LP:0 PSIM:480
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: xHCI 1.0: support
USB2 hardware lpm
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Ext Cap
000000006d495f89, port offset = 17, count = 8, revision = 0x3
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:4 PSIE:3
PLT:0 PFD:1 LP:0 PSIM:5
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:5 PSIE:3
PLT:0 PFD:1 LP:1 PSIM:10
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:6 PSIE:3
PLT:0 PFD:1 LP:1 PSIM:10
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:7 PSIE:3
PLT:0 PFD:1 LP:1 PSIM:20
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Found 14 USB 2.0
ports and 8 USB 3.0 ports.
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Finished xhci_mem_init
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Starting xhci_init
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: xHC can handle at
most 64 device slots
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Setting Max
device slots reg = 0x40
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Setting command
ring address to 0x107513d001
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Doorbell array is
located at offset 0x3000 from cap regs base addr
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: // Write event
ring dequeue pointer, preserving EHB bit
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Finished xhci_init
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: hcc params
0x20007fc1 hci version 0x120 quirks 0x0000000200009810
=> [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PATCHED
xhci_gen_setup: USBSTS: 0x00000001 HCHalted
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Got SBRN 50
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: MWI active
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Finished xhci_pci_reinit
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: supports USB remote wakeup
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: xhci_run
=> [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: ERST deq = 64'h107513e000
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Finished xhci_run
for main hcd
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: xHCI Host Controller
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: new USB bus
registered, assigned bus number 4
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Host supports USB
3.2 Enhanced SuperSpeed
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: supports USB remote wakeup
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Enable interrupts
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Enable primary interrupter
[Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: // Turn on HC, cmd = 0x5.
[Tue May 19 10:06:37 2026] DMAR: DRHD: handling fault status reg 2
=> [Tue May 19 10:06:37 2026] DMAR: [DMA Read NO_PASID] Request device
[80:14.0] fault addr 0x1075140000 [fault reason 0x39] SM: Present bit
in Root Entry is clear
[Tue May 19 10:06:38 2026] usb usb3: default language 0x0409
[Tue May 19 10:06:38 2026] usb usb3: udev 1, busnum 3, minor = 256
[Tue May 19 10:06:38 2026] usb usb3: New USB device found,
idVendor=1d6b, idProduct=0002, bcdDevice= 7.01
[Tue May 19 10:06:38 2026] usb usb3: New USB device strings: Mfr=3,
Product=2, SerialNumber=1
[Tue May 19 10:06:38 2026] usb usb3: Product: xHCI Host Controller
[Tue May 19 10:06:38 2026] usb usb3: Manufacturer: Linux
7.1.0-rc3-30e0ff6d6a83.debug xhci-hcd
[Tue May 19 10:06:38 2026] usb usb3: SerialNumber: 0000:80:14.0
[Tue May 19 10:06:38 2026] usb usb3: usb_probe_device
[Tue May 19 10:06:38 2026] usb usb3: configuration #1 chosen from 1 choice
[Tue May 19 10:06:38 2026] xHCI xhci_add_endpoint called for root hub
[Tue May 19 10:06:38 2026] xHCI xhci_check_bandwidth called for root hub
[Tue May 19 10:06:38 2026] usb usb3: adding 3-0:1.0 (config #1, interface 0)
[Tue May 19 10:06:38 2026] hub 3-0:1.0: usb_probe_interface
[Tue May 19 10:06:38 2026] hub 3-0:1.0: usb_probe_interface - got id
[Tue May 19 10:06:38 2026] hub 3-0:1.0: USB hub found
[Tue May 19 10:06:38 2026] hub 3-0:1.0: 14 ports detected
[Tue May 19 10:06:38 2026] xhci_hcd 0000:00:0d.0: Get port status
2-1 read: 0x2a0, return 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:00:0d.0: Get port status
2-2 read: 0x2a0, return 0x2a0
[Tue May 19 10:06:38 2026] hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000
[Tue May 19 10:06:38 2026] xhci_hcd 0000:00:0d.0: set port remote
wake mask, actual port 2-1 status = 0xe0002a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:00:0d.0: set port remote
wake mask, actual port 2-2 status = 0xe0002a0
[Tue May 19 10:06:38 2026] hub 2-0:1.0: hub_suspend
[Tue May 19 10:06:38 2026] usb usb2: bus auto-suspend, wakeup 1
[Tue May 19 10:06:38 2026] usb usb2: suspend raced with wakeup event
[Tue May 19 10:06:38 2026] usb usb2: usb auto-resume
[Tue May 19 10:06:38 2026] hub 3-0:1.0: standalone hub
[Tue May 19 10:06:38 2026] hub 3-0:1.0: no power switching (usb 1.0)
[Tue May 19 10:06:38 2026] hub 3-0:1.0: individual port
over-current protection
[Tue May 19 10:06:38 2026] hub 3-0:1.0: Single TT
[Tue May 19 10:06:38 2026] hub 3-0:1.0: TT requires at most 8 FS
bit times (666 ns)
[Tue May 19 10:06:38 2026] hub 3-0:1.0: power on to power good time: 20ms
[Tue May 19 10:06:38 2026] hub 3-0:1.0: local power source is good
[Tue May 19 10:06:38 2026] usb usb3-port14: DeviceRemovable is
changed to 1 according to platform information.
[Tue May 19 10:06:38 2026] hub 3-0:1.0: trying to enable port power
on non-switchable hub
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-1 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-2 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-3 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-4 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-5 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-6 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-7 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-8 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-9 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-10 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-11 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-12 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-13 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
3-14 ON, portsc: 0x206e1
[Tue May 19 10:06:38 2026] usb usb4: skipped 1 descriptor after endpoint
[Tue May 19 10:06:38 2026] usb usb4: default language 0x0409
[Tue May 19 10:06:38 2026] usb usb4: udev 1, busnum 4, minor = 384
[Tue May 19 10:06:38 2026] usb usb4: New USB device found,
idVendor=1d6b, idProduct=0003, bcdDevice= 7.01
[Tue May 19 10:06:38 2026] hub 2-0:1.0: hub_resume
[Tue May 19 10:06:38 2026] usb usb4: New USB device strings: Mfr=3,
Product=2, SerialNumber=1
[Tue May 19 10:06:38 2026] usb usb4: Product: xHCI Host Controller
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-1 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-2 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-3 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-4 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-5 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-6 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-7 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-8 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-9 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-10 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-11 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-12 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-13 read: 0x2a0, return 0x100
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-14 read: 0x206e1, return 0x10101
[Tue May 19 10:06:38 2026] usb usb3-port14: status 0101 change 0001
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: clear port14
connect change, portsc: 0x6e1
[Tue May 19 10:06:38 2026] usb usb4: Manufacturer: Linux
7.1.0-rc3-30e0ff6d6a83.debug xhci-hcd
[Tue May 19 10:06:38 2026] usb usb4: SerialNumber: 0000:80:14.0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:00:0d.0: Get port status
2-1 read: 0x2a0, return 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:00:0d.0: Get port status
2-2 read: 0x2a0, return 0x2a0
[Tue May 19 10:06:38 2026] hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000
[Tue May 19 10:06:38 2026] usb usb4: usb_probe_device
[Tue May 19 10:06:38 2026] usb usb4: configuration #1 chosen from 1 choice
[Tue May 19 10:06:38 2026] xHCI xhci_add_endpoint called for root hub
[Tue May 19 10:06:38 2026] xHCI xhci_check_bandwidth called for root hub
[Tue May 19 10:06:38 2026] usb usb4: adding 4-0:1.0 (config #1, interface 0)
[Tue May 19 10:06:38 2026] hub 4-0:1.0: usb_probe_interface
[Tue May 19 10:06:38 2026] hub 4-0:1.0: usb_probe_interface - got id
[Tue May 19 10:06:38 2026] hub 4-0:1.0: USB hub found
[Tue May 19 10:06:38 2026] hub 4-0:1.0: 8 ports detected
[Tue May 19 10:06:38 2026] hub 4-0:1.0: standalone hub
[Tue May 19 10:06:38 2026] hub 4-0:1.0: no power switching (usb 1.0)
[Tue May 19 10:06:38 2026] hub 4-0:1.0: individual port
over-current protection
[Tue May 19 10:06:38 2026] hub 4-0:1.0: TT requires at most 8 FS
bit times (666 ns)
[Tue May 19 10:06:38 2026] hub 4-0:1.0: power on to power good time: 100ms
[Tue May 19 10:06:38 2026] hub 4-0:1.0: local power source is good
[Tue May 19 10:06:38 2026] usb usb4-port1: peered to usb3-port9
[Tue May 19 10:06:38 2026] usb usb4-port2: peered to usb3-port12
[Tue May 19 10:06:38 2026] usb usb4-port3: peered to usb3-port8
[Tue May 19 10:06:38 2026] usb usb4-port4: peered to usb3-port7
[Tue May 19 10:06:38 2026] usb usb4-port5: peered to usb3-port10
[Tue May 19 10:06:38 2026] usb usb4-port6: peered to usb3-port3
[Tue May 19 10:06:38 2026] usb usb4-port7: peered to usb3-port4
[Tue May 19 10:06:38 2026] usb usb4-port8: peered to usb3-port5
[Tue May 19 10:06:38 2026] usb usb4: port-1 no _DSM function 5
[Tue May 19 10:06:38 2026] usb usb4: port-1 disable U1/U2 _DSM: -19
[Tue May 19 10:06:38 2026] usb usb4: port-2 no _DSM function 5
[Tue May 19 10:06:38 2026] usb usb4: port-2 disable U1/U2 _DSM: -19
[Tue May 19 10:06:38 2026] usb usb4: port-3 no _DSM function 5
[Tue May 19 10:06:38 2026] usb usb4: port-3 disable U1/U2 _DSM: -19
[Tue May 19 10:06:38 2026] usb usb4: port-4 no _DSM function 5
[Tue May 19 10:06:38 2026] usb usb4: port-4 disable U1/U2 _DSM: -19
[Tue May 19 10:06:38 2026] usb usb4: port-5 no _DSM function 5
[Tue May 19 10:06:38 2026] usb usb4: port-5 disable U1/U2 _DSM: -19
[Tue May 19 10:06:38 2026] usb usb4: port-6 no _DSM function 5
[Tue May 19 10:06:38 2026] usb usb4: port-6 disable U1/U2 _DSM: -19
[Tue May 19 10:06:38 2026] usb usb4: port-7 no _DSM function 5
[Tue May 19 10:06:38 2026] usb usb4: port-7 disable U1/U2 _DSM: -19
[Tue May 19 10:06:38 2026] usb usb4: port-8 no _DSM function 5
[Tue May 19 10:06:38 2026] usb usb4: port-8 disable U1/U2 _DSM: -19
[Tue May 19 10:06:38 2026] hub 4-0:1.0: trying to enable port power
on non-switchable hub
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
4-1 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
4-2 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
4-3 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
4-4 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
4-5 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
4-6 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
4-7 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power
4-8 ON, portsc: 0x2a0
[Tue May 19 10:06:38 2026] usbcore: registered new interface driver
usbserial_generic
[Tue May 19 10:06:38 2026] usbserial: USB Serial support registered
for generic
[Tue May 19 10:06:38 2026] hub 3-0:1.0: state 7 ports 14 chg 4000 evt 0000
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
3-14 read: 0x6e1, return 0x101
[Tue May 19 10:06:38 2026] usb usb3-port14: status 0101, change 0000, 12 Mb/s
=> [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: PATCHED:
xhci_alloc_dev: B4 TRB_ENABLE_SLOT USBSTS: 0x00000015 HCHalted HSE PCD
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: // Ding dong!
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
4-1 read: 0x2a0, return 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
4-2 read: 0x2a0, return 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
4-3 read: 0x2a0, return 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
4-4 read: 0x2a0, return 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
4-5 read: 0x2a0, return 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
4-6 read: 0x2a0, return 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
4-7 read: 0x2a0, return 0x2a0
[Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status
4-8 read: 0x2a0, return 0x2a0
...
=========

> > 1001680000 is close to valid addresses like 100167e000 or 100167c000.
> >
> > Possible causes:
> > - xHCI or IOMMU driver bug

Since it happens on different drivers, it is starting to feel like
iommu bug that only happens in this kdump scenario.
However, init shouldn't be stuck waiting for the lock that hub kworker
task is holding.
The system should be able to reboot automatically after capturing the vmcore.

> > - HW corrupted a pointer
> > - HW accessed something out of bounds
> > - HW dereferenced a stale pointer from the original kernel
> >
> > Do you happen to have more of those logs saved, are they all like that?

Since the last time we interacted, I lost access to the system and it
got formatted - no more old logs. However, I've got the system back
today and had some interesting developments.

> > Any chance that 1001680000 appears somewhere in the main kernel's log?

The fault addresses do not appear in the main log, nor anywhere other
than the DMAR fault addr messages in the crashkernel's log.

However, by comparing the previous log messages from the past kernel,
to the ones I saw with the new kernel I built today, I noticed the
same 8K displacement from the fault addr. Maybe an iommu driver bug
clue?

= 7.0.0-clean =

=> [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Device context
base array address = 0x0x000000100167c000 (DMA), 00000000d042f7e3
(virt)
=> [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: ERST deq = 64'h100167e000
=> [Fri May 1 09:46:41 2026] DMAR: [DMA Read NO_PASID] Request device
[80:14.0] fault addr 0x1001680000 [fault reason 0x39] SM: Present bit
in Root Entry is clear

0x100167c000
0x100167e000
^
0x2000
v
0x1001680000

= 7.1.0-rc3-30e0ff6d6a83.debug =

=> [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Device context
base array address = 0x000000107513c000 (DMA), 000000002c3aab07 (virt)
=> [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: ERST deq = 64'h107513e000
=> [Tue May 19 10:06:37 2026] DMAR: [DMA Read NO_PASID] Request device
[80:14.0] fault addr 0x1075140000 [fault reason 0x39] SM: Present bit
in Root Entry is clear

0x107513c000
0x107513e000
^
0x2000
v
0x1075140000

> I see a certain lack of interest in finding the root cause of this.

Actually, I've just come back from a bereavement leave. During that
time I also lost access to the system which I got back today -
apologies for the radio silence.

> I have done a simple test on my own HW: writing bogus CRCR to cause
> IOMMU fault when the first command is submitted. I found that not all
> HCs reliably set HSE in this case, but obviously none of them ever
> complete the command properly.

Wow - good to know! I guess I would had expected to have HSE always
being set in these kind of situations in the command ring register.
Just out of curiosity, how did you figure out that only some HCs set
HSE? Tested on a few HCs or inferred that somehow?

> It seems that unconditional hc_died() on Enable Slot timeout may not be a bad idea.

That was the idea of the original patch when I saw the HSE at that
point of the reboot sequence after the vmcore was captured.
However, different from my original patch, only one
XHCI_CMD_DEFAULT_TIMEOUT is enough (even tested a few weeks ago after
submission).

Now if we can't trust that all HCs will reliably set HSE on scenarios
like this one (iommu issues on the crashkernel?), the unconditional
hc_died() starts to feel like a safer approach.

> Makes me wonder if the same shouldn't apply to all commands
> besides Address Device, they typically only timeout due to HW issues.

In the past (commit c311e391a7efd101250c0e123286709b7e736249 "xhci:
rework command timeout and cancellation,") all commands used to wait
for a timeout of XHCI_CMD_DEFAULT_TIMEOUT - even Address Device.

>
>

PS: there was a big iommu PR a few days ago - all the results from on
this email were performed with a recent 7.1.0-rc3 kernel checked out
at 30e0ff6d6a83.

Best Regards,

Desnes