Re: [REGRESSION] GPU passes into VM improperly after c376a3456d8b or a98db518dde2

From: Baolu Lu

Date: Sat May 23 2026 - 23:24:49 EST


On 4/27/2026 3:15 PM, Baolu Lu wrote:
On 4/14/26 17:22, 70sp wrote:
I can confirm, that the "domain is not compatible with device" message is nowhere to be seen.

I have double checked by also adding an else statement with a different message and that one showed up several times. (by pci (iGPU) 0000:00:02.0, pcieport 0000:00:01.0 and vfio-pci (GTX 970) 0000:01:00.0, 0000:01:00.1). ret = 0.


Hmm, it seems the domain is compatible with the device hardware and was
attached successfully. Perhaps you can try to check the differences
between these two domain attachments by dumping the root, context, and
PASID table entries and comparing the configurations of the success and
failure cases.

To do this, simply apply the change below with CONFIG_DMAR_DEBUG
enabled:

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4d0e65bc131d..bf303cfcf2ee 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1345,6 +1345,9 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
        if (ret)
                goto out_block_translation;

+       dmar_fault_dump_ptes(iommu, PCI_DEVID(info->bus, info->devfn),
+                            0, IOMMU_NO_PASID);
+
        return 0;

 out_block_translation:

Have you tried this patch? It dumps the context and PASID table entries
after a domain is attached to the device. Hopefully, you can find some
clues by comparing the good and bad kernels.

Thanks,
baolu