[PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap
From: Nicolin Chen
Date: Tue Mar 17 2026 - 15:21:13 EST
An ATC invalidation timeout is a fatal error. While the SMMUv3 hardware is
aware of the timeout via a GERROR interrupt, the driver thread issuing the
commands lacks a direct mechanism to verify whether its specific batch was
the cause or not, as polling the CMD_SYNC status doesn't natively return a
failure code, making it very difficult to coordinate per-device recovery.
Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge this
gap. When the ISR detects an ATC timeout, set the bit corresponding to the
physical CMDQ index of the faulting CMD_SYNC command.
On the issuer side, after polling completes (or times out), test and clear
its dedicated bit. If set, override any generic timeout, return -ETIMEDOUT
to trigger device quarantine.
Signed-off-by: Nicolin Chen <nicolinc@xxxxxxxxxx>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 20 +++++++++++++++++++-
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 36de2b0b2ebe6..3eb12a34b086a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -633,6 +633,7 @@ struct arm_smmu_cmdq {
atomic_long_t *valid_map;
atomic_t owner_prod;
atomic_t lock;
+ unsigned long *atc_sync_timeouts;
bool (*supports_cmd)(struct arm_smmu_cmdq_ent *ent);
};
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 01030ffd2fe23..9c8972ebc94f9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -445,7 +445,10 @@ void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
* at the CMD_SYNC. Attempt to complete other pending commands
* by repeating the CMD_SYNC, though we might well end up back
* here since the ATC invalidation may still be pending.
+ *
+ * Mark the faulty batch in the bitmap for the issuer to match.
*/
+ set_bit(Q_IDX(&q->llq, cons), cmdq->atc_sync_timeouts);
return;
case CMDQ_ERR_CERROR_ILL_IDX:
default:
@@ -895,9 +898,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
/* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
if (sync) {
+ u32 sync_prod;
+
llq.prod = queue_inc_prod_n(&llq, n);
+ sync_prod = llq.prod;
+
ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
- if (ret) {
+ if (test_and_clear_bit(Q_IDX(&llq, sync_prod),
+ cmdq->atc_sync_timeouts)) {
+ dev_err_ratelimited(smmu->dev,
+ "CMD_SYNC for ATC_INV timeout at prod=0x%08x\n",
+ sync_prod);
+ ret = -ETIMEDOUT;
+ } else if (ret) {
dev_err_ratelimited(smmu->dev,
"CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
llq.prod,
@@ -4458,6 +4471,11 @@ int arm_smmu_cmdq_init(struct arm_smmu_device *smmu,
if (!cmdq->valid_map)
return -ENOMEM;
+ cmdq->atc_sync_timeouts =
+ devm_bitmap_zalloc(smmu->dev, nents, GFP_KERNEL);
+ if (!cmdq->atc_sync_timeouts)
+ return -ENOMEM;
+
return 0;
}
--
2.43.0