Re: [PATCH] iommu/vt-d: Clear Present bit before tearing down scalable-mode context entry

From: Baolu Lu

Date: Mon Jun 01 2026 - 01:48:02 EST


On 5/28/26 10:55, Michael Bommarito wrote:
device_pasid_table_teardown() zeroes the 128-bit scalable-mode context
entry with context_clear_entry() while the Present bit is still set. This
creates a window where the hardware can fetch a torn entry, with some
fields already zeroed while Present is still set, leading to unpredictable
behavior or spurious faults. The context-cache invalidation is issued only
after the entry has been zeroed, and intel_pasid_free_table() then frees
the PASID directory pages, so the IOMMU can keep walking a stale Present=1
entry that points at freed memory.

While x86 provides strong write ordering, the compiler may reorder the two
64-bit writes to the entry, and the hardware fetch is not guaranteed to be
atomic with respect to multiple CPU writes.

Commit c1e4f1dccbe9d ("iommu/vt-d: Clear Present bit before tearing down
context entry") fixed this exact pattern in domain_context_clear_one() and
the copied-context path, but device_pasid_table_teardown() was not
converted.

Align it with the "Guidance to Software for Invalidations" in the VT-d
spec, Section 6.5.3.3, using the same ownership handshake as the sibling
fix: clear only the Present bit, flush it to the IOMMU, perform the
context-cache invalidation, and only then zero the rest of the entry.

Fixes: 81e921fd32161 ("iommu/vt-d: Fix NULL domain on device release")
Signed-off-by: Michael Bommarito<michael.bommarito@xxxxxxxxx>
Assisted-by:Claude:claude-opus-4-7
---
Found by static analysis while auditing the callers of context_clear_entry()
for the same teardown ordering that c1e4f1dccbe9d addressed. This site is
reachable only in scalable mode, so it does not manifest on the legacy-mode
hardware available to me; I could not trigger a runtime fault and the change
is verified by code inspection only, on the same basis as the sibling fix.
Compile-tested on x86_64 with CONFIG_INTEL_IOMMU; no new warnings.

drivers/iommu/intel/pasid.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

Sashiko reported a pre-existing issue:

https://sashiko.dev/#/patchset/20260528025557.3209367-1-michael.bommarito%40gmail.com

I have added it to my task list and will follow up with a fix in a
separate patch.

[Severity: Critical]
This is a pre-existing issue, but does this context cache flush actually
invalidate the alias BDF?

It looks like intel_context_flush_no_pasid() issues a device-selective
invalidation that is hardcoded to use the primary device's BDF from info
(PCI_DEVID(info->bus, info->devfn)), completely ignoring the alias's bus
and devfn parameters passed into device_pasid_table_teardown().

If device_pasid_table_teardown() is called for a DMA alias, the IOMMU
context cache for the alias BDF might never be invalidated.

When the domain's page tables or scalable-mode PASID directories are
subsequently freed, could any DMA arriving under the alias BDF cause the
IOMMU hardware to walk the stale cache entry into freed system memory?

Thanks,
baolu