Re: [PATCH v5 7/9] vfio: Enable cdev noiommu mode under iommufd
From: Yi Liu
Date: Wed May 20 2026 - 23:17:44 EST
On 5/21/26 02:15, Jacob Pan wrote:
Hi Yi,
On Wed, 20 May 2026 15:20:26 +0800
Yi Liu <yi.l.liu@xxxxxxxxx> wrote:
On 5/20/26 11:46, Alex Williamson wrote:agreed, let me give it a try in v6.
On Mon, 11 May 2026 11:41:12 -0700
Jacob Pan <jacob.pan@xxxxxxxxxxxxxxxxxxx> wrote:
Now that devices under noiommu mode can bind with IOMMUFD and
perform IOAS operations, lift restrictions on cdev from VFIO side.
Remove the vfio_device_is_group_noiommu() early returns in
vfio_df_iommufd_bind() and vfio_df_iommufd_unbind() so that both
group and cdev noiommu devices go through the standard iommufd bind
path. This is safe because iommufd_device_bind() now handles
noiommu devices via its own iommufd_device_is_noiommu() check.
Add CAP_SYS_RAWIO checks for cdev open and bind under noiommu to
maintain security parity with the group noiommu path.
No IOMMU cdevs are explicitly named with noiommu prefix. e.g.
/dev/vfio/
|-- devices
| `-- noiommu-vfio0
`-- vfio
Signed-off-by: Jacob Pan <jacob.pan@xxxxxxxxxxxxxxxxxxx>
---
v5:
- Add Kconfig VFIO_CDEV_NOIOMMU to select IOMMUFD_NOIOMMU
and its dependencies
- Add comment to explain vfio_noiommu conditional definition
(Alex)
- Removed early return for group noiommu in bind/unbind
- Use consistent wording referring to VFIO noiommu mode (Kevin)
- Update unsafe_noiommu Kconfig help text (Kevin)
- Change dev_warn to dev_info for noiommu enabling msg (Kevin)
v4:
- Remove early return in iommufd_bind for noiommu (Alex)
v3:
- Consolidate into fewer patches
v2:
- removed unnecessary device->noiommu set in
iommufd_vfio_compat_ioas_get_id()
---
drivers/vfio/Kconfig | 3 +--
drivers/vfio/device_cdev.c | 10 ++++++++++
drivers/vfio/iommufd.c | 7 -------
drivers/vfio/vfio.h | 22 ++++++++++++++--------
drivers/vfio/vfio_main.c | 25 ++++++++++++++++++++-----
include/linux/vfio.h | 1 +
6 files changed, 46 insertions(+), 22 deletions(-)
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index b1b1633412a9..b1a260b6054c 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -22,8 +22,7 @@ config VFIO_DEVICE_CDEV
The VFIO device cdev is another way for userspace to
get device access. Userspace gets device fd by opening device cdev
under /dev/vfio/devices/vfioX, and then bind the device fd with an
iommufd
- to set up secure DMA context for device access. This
interface does
- not support noiommu.
+ to set up secure DMA context for device access.
If you don't know what to do here, say N.
diff --git a/drivers/vfio/device_cdev.c
b/drivers/vfio/device_cdev.c index 54abf312cf04..46a808244398
100644 --- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -27,6 +27,9 @@ int vfio_device_fops_cdev_open(struct inode
*inode, struct file *filep) struct vfio_device_file *df;
int ret;
+ if (device->noiommu && !capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
/* Paired with the put in vfio_device_fops_release() */
if (!vfio_device_try_get_registration(device))
return -ENODEV;
@@ -110,6 +113,13 @@ long vfio_df_ioctl_bind_iommufd(struct
vfio_device_file *df, if (df->group)
return -EINVAL;
+ /*
+ * CAP_SYS_RAWIO is already checked at cdev open, recheck
here
+ * in case the fd was passed to a less privileged process.
+ */
+ if (device->noiommu && !capable(CAP_SYS_RAWIO))
+ return -EPERM;
+
ret = vfio_device_block_group(device);
if (ret)
return ret;
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 39079ab27f92..bc80056c74d3 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -25,10 +25,6 @@ int vfio_df_iommufd_bind(struct
vfio_device_file *df)
lockdep_assert_held(&vdev->dev_set->lock);
- /* Returns 0 to permit device opening under noiommu mode
*/
- if (vfio_device_is_group_noiommu(vdev))
- return 0;
-
return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
}
@@ -58,9 +54,6 @@ void vfio_df_iommufd_unbind(struct
vfio_device_file *df)
lockdep_assert_held(&vdev->dev_set->lock);
- if (vfio_device_is_group_noiommu(vdev))
- return;
-
if (vdev->ops->unbind_iommufd)
vdev->ops->unbind_iommufd(vdev);
}
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 602623cacfc0..ac79b1a2fce9 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -36,7 +36,7 @@ vfio_allocate_device_file(struct vfio_device
*device);
extern const struct file_operations vfio_device_fops;
-#ifdef CONFIG_VFIO_GROUP_NOIOMMU
+#if IS_ENABLED(CONFIG_VFIO_GROUP_NOIOMMU) ||
IS_ENABLED(CONFIG_VFIO_CDEV_NOIOMMU)
Have you considered what happens when these are y/n or n/y?
I think in the former case we can create cdev devices for
group-noiommu devices that are not labeled noiommu, skip the
CAP_SYS_RAWIO test, but will fail to bind. In the latter case, I
think we fail to setup an iommufd_device and unbind will segfault.
We really don't need to support independently setting GROUP vs CDEV
NOIOMMU, the suggestion was to try to get NOIOMMU from depending on
VFIO_GROUP. We can do that other ways though and I think we can do
it without the rename in patch 1 that will inevitably result in
some lost config options for NOIOMMU on upgrade.
The Kconfig may get messy, perhaps something like:
config VFIO_NOIOMMU
bool "VFIO No-IOMMU support"
depends on VFIO_GROUP || VFIO_DEVICE_CDEV
depends on !VFIO_GROUP || VFIO_CONTAINER ||
IOMMUFD_VFIO_CONTAINER depends on !VFIO_DEVICE_CDEV ||
!GENERIC_ATOMIC64 select IOMMUFD_NOIOMMU if VFIO_DEVICE_CDEV
Sorry if the previous suggestion sent us astray, but the subtleties
of independent support look tricky. Thanks,
this also looks better to me. Less kconfigs. :)
just one nit: current VFIO_NOIOMMU only depends on VFIO_GROUP, maybe a
separate patch to extend it depends on 'VFIO_GROUP && (VFIO_CONTAINER
|| IOMMUFD_VFIO_CONTAINER)' first, then add the cdev path noiommu
dependency.
I had this below in patch 1/9. will do in a separate patch.
Thanks,
Jacob
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -60,9 +60,9 @@ config VFIO_IOMMU_SPAPR_TCE
default VFIO
endif
-config VFIO_NOIOMMU
- bool "VFIO No-IOMMU support"
- depends on VFIO_GROUP
+config VFIO_GROUP_NOIOMMU
+ bool "VFIO group No-IOMMU support"
+ depends on VFIO_GROUP && (VFIO_CONTAINER ||
IOMMUFD_VFIO_CONTAINER) help
got it. not sure if it needs to have a fix tag or not. @Alex?