RE: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap
From: Tian, Kevin
Date: Wed Mar 18 2026 - 23:08:33 EST
> From: Samiullah Khawaja <skhawaja@xxxxxxxxxx>
> Sent: Thursday, March 19, 2026 6:07 AM
>
> Hi Nicolin,
>
> On Wed, Mar 18, 2026 at 12:26:33PM -0700, Nicolin Chen wrote:
> >On Wed, Mar 18, 2026 at 07:36:20AM +0000, Tian, Kevin wrote:
> >> > From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> >> > Sent: Wednesday, March 18, 2026 3:16 AM
> >> >
> >> > An ATC invalidation timeout is a fatal error. While the SMMUv3
> hardware is
> >> > aware of the timeout via a GERROR interrupt, the driver thread issuing
> the
> >> > commands lacks a direct mechanism to verify whether its specific batch
> was
> >> > the cause or not, as polling the CMD_SYNC status doesn't natively return
> a
> >> > failure code, making it very difficult to coordinate per-device recovery.
> >> >
> >> > Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge
> this
> >> > gap. When the ISR detects an ATC timeout, set the bit corresponding to
> the
> >> > physical CMDQ index of the faulting CMD_SYNC command.
> >> >
> >>
> >> It's nice to see the ability of allowing sw to identify the faulting sync
> command
> >> upon an ATC timeout! On VT-d it's not feasible when multiple wait
> descriptors
> >> (similar to CMD_SYNC) are in-fly... :/
> >
> >Actually SMMU doesn't know which device is faulting when CMD_SYNC
>
> VT-d is able to find out the SID of the device for which the device TLB
> invalidation timed-out occured by using the SID reported in the
> "Invalidation Queue Error Record Register" (VT-d Specs 11.4.9.9).
yes. but when there are multiple submissions (each with a wait descriptor)
fetched/handled by the hw and then an invalidation timeout comes, all
pending wait descriptors will be aborted (not just the one corresponding
to the timeout). In this case all affected submitters need to re-try.