Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap
From: Nicolin Chen
Date: Wed Mar 18 2026 - 19:25:05 EST
Hi Sami,
On Wed, Mar 18, 2026 at 10:02:32PM +0000, Samiullah Khawaja wrote:
> On Tue, Mar 17, 2026 at 12:15:37PM -0700, Nicolin Chen wrote:
> > @@ -895,9 +898,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> >
> > /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
> > if (sync) {
> > + u32 sync_prod;
> > +
> > llq.prod = queue_inc_prod_n(&llq, n);
> > + sync_prod = llq.prod;
> > +
> > ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> > - if (ret) {
> > + if (test_and_clear_bit(Q_IDX(&llq, sync_prod),
> > + cmdq->atc_sync_timeouts)) {
>
> This will not be set if a software timeout (1 second) occurs. Do you
> know if the ATC timeout of Arm sMMUv3 is less than the software timeout
> in the driver?
You brought up a good point!
I think ATC timeout follows the PCI Completion Timeout Value in
"Device Control 2 Register", which is typically set [50us, 50ms]
but can be set up to [17s, 64s] according to PCI Base spec.
> If not maybe we can handle the software timeout here also as the cmdlist
> is already known?
I think it's trickier.
If the software times out first at 1s, it means the CMDQ is still
pending on wait for the completion of ATC invalidation. Then, the
caller sees -ETIMEOUT and tries to bisect the ATC batch or update
the STE directly, either of which involves CMDQ. But CMDQ has not
recovered yet.
Then, in case of a batch, all the reties could timeout again. So,
it will fail to identify which device is truly broken. This would
end badly by blindly disabling all the devices in the batch. Also
the disabling calls require CMDQ too, so they might fail as well.
Thus, partially to answer the question, in case software timeout,
I am afraid that we can hardly do anything.. :-/
This means I need to set a different return code for ATC timeouts
v.s. software timeouts.
Also, there is another problem: when PCI CTO finally reaches, the
GERROR ISR will set atc_sync_timeouts but nobody will clear it..
So, before calling arm_smmu_cmdq_issue_cmdlist(), we need to make
sure there is no dirty bit on the bitmap too.
Thanks!
Nicolin