Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

From: Samiullah Khawaja

Date: Wed Mar 18 2026 - 18:06:59 EST


Hi Nicolin,

On Wed, Mar 18, 2026 at 12:26:33PM -0700, Nicolin Chen wrote:
On Wed, Mar 18, 2026 at 07:36:20AM +0000, Tian, Kevin wrote:
> From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Sent: Wednesday, March 18, 2026 3:16 AM
>
> An ATC invalidation timeout is a fatal error. While the SMMUv3 hardware is
> aware of the timeout via a GERROR interrupt, the driver thread issuing the
> commands lacks a direct mechanism to verify whether its specific batch was
> the cause or not, as polling the CMD_SYNC status doesn't natively return a
> failure code, making it very difficult to coordinate per-device recovery.
>
> Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge this
> gap. When the ISR detects an ATC timeout, set the bit corresponding to the
> physical CMDQ index of the faulting CMD_SYNC command.
>

It's nice to see the ability of allowing sw to identify the faulting sync command
upon an ATC timeout! On VT-d it's not feasible when multiple wait descriptors
(similar to CMD_SYNC) are in-fly... :/

Actually SMMU doesn't know which device is faulting when CMD_SYNC

VT-d is able to find out the SID of the device for which the device TLB
invalidation timed-out occured by using the SID reported in the
"Invalidation Queue Error Record Register" (VT-d Specs 11.4.9.9).
follows ATC_INV commands for multiple devices. The commit message
in PATCH-7 describes this in the end. So Jason suggested to retry
those ATC_INV commands by bisecting them per-device, which allows
us to pinpoint which device.

But for a software timeout, something like this would be needed.

Could VT-d do the same?

Nicolin


Thanks,
Sami