Re: [PATCH v2] usb: hcd: Add a usb_device argument to hc_driver.endpoint_reset()
From: Michał Pecio
Date: Thu Apr 17 2025 - 05:35:20 EST
On Thu, 17 Apr 2025 11:54:19 +0300, Mathias Nyman wrote:
> On 15.4.2025 12.10, Michal Pecio wrote:
> > xHCI needs usb_device here, so it stored it in host_endpoint.hcpriv,
> > which proved problematic due to some unexpected call sequences from
> > USB core, and generally made the code more complex than it has to
> > be.
> >
> > Make USB core supply it directly and simplify xhci_endpoint_reset().
> > Use the xhci_check_args() helper for preventing resets of emulated
> > root hub endpoints and for argument validation.
> >
> > Update other drivers which also define such callback to accept the
> > new argument and ignore it, as it seems to be of no use for them.
> >
> > This fixes a 6.15-rc1 regression reported by Paul, which I was able
> > to reproduce, where xhci_hcd doesn't handle endpoint_reset() after
> > endpoint_disable() not followed by add_endpoint(). If a configured
> > device is reset, stalling endpoints start to get stuck permanently.
> >
> > Reported-by: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
> > Closes: https://lore.kernel.org/linux-usb/c279bd85-3069-4841-b1be-20507ac9f2d7@xxxxxxxxxxxxx/
> > Signed-off-by: Michal Pecio <michal.pecio@xxxxxxxxx>
> > ---
>
> All xhci changes look good to me
>
> Acked-by: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx>
Thank you for the review.
I guess I should update the commit message, though?
Technically, the regression will be closed by the next usb-linus merge
due to EP_STALLED reverts, while this patch really fixes and old hidden
bug which I could probably do a better job at explaining.
Greg would like a "Fixes". I think the problem started somewhere here:
f5249461b504 xhci: Clear the host side toggle manually when endpoint is soft reset
18b74067ac78 xhci: Fix use-after-free regression in xhci clear hub TT implementation
The former introduced an endpoint_reset() which depends on ep->hcpriv,
the latter introduced an endpoint_disable() which clears ep->hcpriv.
Which of them was wrong depends on whether it is legal to expect hcpriv
to be preserved after endpoint_disable(), I honestly don't know.
I also don't know if it will make sense to fix this in stable, since
nobody apparently noticed before EP_STALLED. But a class driver which
tries to clear a not halted EP on a device that had been reset in the
past could create toggle mismatch. I have not yet found such a driver.
Michal