Re: [PATCH v7 2/3] vfio/ism: Implement vfio_pci driver for ISM devices
From: Niklas Schnelle
Date: Mon Mar 23 2026 - 17:32:10 EST
On Mon, 2026-03-23 at 10:32 +0100, Julian Ruess wrote:
> Add a vfio_pci variant driver for the s390-specific Internal Shared
> Memory (ISM) devices used for inter-VM communication.
>
> This enables the development of vfio-pci-based user space drivers for
> ISM devices.
>
> On s390, kernel primitives such as ioread() and iowrite() are switched
> over from function handle based PCI load/stores instructions to PCI
> memory-I/O (MIO) loads/stores when these are available and not
> explicitly disabled. Since these instructions cannot be used with ISM
> devices, ensure that classic function handle-based PCI instructions are
> used instead.
>
> The driver is still required even when MIO instructions are disabled, as
> the ISM device relies on the PCI store block (PCISTB) instruction to
> perform write operations.
>
> Stores are not fragmented, therefore one ioctl corresponds to exactly
> one PCISTB instruction. User space must ensure to not write more than
> 4096 bytes at once to an ISM BAR which is the maximum payload of the
> PCISTB instruction.
>
> Signed-off-by: Julian Ruess <julianr@xxxxxxxxxxxxx>
> ---
> drivers/vfio/pci/Kconfig | 2 +
> drivers/vfio/pci/Makefile | 2 +
> drivers/vfio/pci/ism/Kconfig | 10 ++
> drivers/vfio/pci/ism/Makefile | 3 +
> drivers/vfio/pci/ism/main.c | 401 ++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 418 insertions(+)
>
--- snip ---
> --- /dev/null
> +++ b/drivers/vfio/pci/ism/main.c
> @@ -0,0 +1,401 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * vfio-ISM driver for s390
> + *
> + * Copyright IBM Corp.
> + */
> +
> +#include "../vfio_pci_priv.h"
> +#include "linux/slab.h"
> +
> +#define ISM_VFIO_PCI_OFFSET_SHIFT 48
> +#define ISM_VFIO_PCI_OFFSET_TO_INDEX(off) (off >> ISM_VFIO_PCI_OFFSET_SHIFT)
Nit: checkpatch.pl --strict recommends to add () around the off macro
argument to prevent precedence issues.
> +#define ISM_VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << ISM_VFIO_PCI_OFFSET_SHIFT)
> +#define ISM_VFIO_PCI_OFFSET_MASK (((u64)(1) << ISM_VFIO_PCI_OFFSET_SHIFT) - 1)
> +
> +/*
> + * Use __zpci_load() to bypass automatic use of
> + * PCI MIO instructions which are not supported on ISM devices
> + */
> +#define ISM_READ(size) \
> + static int ism_read##size(struct zpci_dev *zdev, int bar, \
> + ssize_t *filled, char __user *buf, \
> + loff_t off) \
> + { \
> + u64 req, tmp; \
> + u##size val; \
> + int ret; \
> + \
> + req = ZPCI_CREATE_REQ(READ_ONCE(zdev->fh), bar, sizeof(val)); \
Not your fault since it is in the original ZPCI_CREATE_REQ() also but I
think the READ_ONCE() is actually useless. The zdev->fh can indeed
change under us in some error/recovery scenarios but then the PCI
instructions would also just return a stale handle error and really the
access isn't really at risk of getting torn either. Still, I'd keep
this for consistency and then we can think of cleaning this up
everywhere.
> + ret = __zpci_load(&tmp, req, off); \
> + if (ret) \
> + return ret; \
> + val = (u##size)tmp; \
> + if (copy_to_user(buf, &val, sizeof(val))) \
> + return -EFAULT; \
> + *filled = sizeof(val); \
> + return 0; \
> + }
> +
> +ISM_READ(64);
> +ISM_READ(32);
> +ISM_READ(16);
> +ISM_READ(8);
> +
--- snip ---
> +
> +static int ism_vfio_pci_init_dev(struct vfio_device *core_vdev)
> +{
> + struct zpci_dev *zdev = to_zpci(to_pci_dev(core_vdev->dev));
> + struct ism_vfio_pci_core_device *ivpcd;
> + char cache_name[20];
> + int ret;
> +
> + ivpcd = container_of(core_vdev, struct ism_vfio_pci_core_device,
> + core_device.vdev);
> +
> + snprintf(cache_name, sizeof(cache_name), "ism_sb_fid_%08x", zdev->fid);
> + ivpcd->store_block_cache =
> + kmem_cache_create(cache_name, zdev->maxstbl, PAGE_SIZE,
> + (SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT), NULL);
> + if (!ivpcd->store_block_cache)
> + return -ENOMEM;
> +
> + ret = vfio_pci_core_init_dev(core_vdev);
> + if (ret)
> + kmem_cache_destroy(ivpcd->store_block_cache);
> +
> + return ret;
> +}
> +
> +static void ism_vfio_pci_release_dev(struct vfio_device *core_vdev)
> +{
> + struct ism_vfio_pci_core_device *ivpcd = container_of(
> + core_vdev, struct ism_vfio_pci_core_device, core_device.vdev);
Here checkpatch.pl --strict complains about '(' at the end of a line
but clang-format also does it this way and even shortening the name
doesn't make it fit so I'd tend to keep it this way.
> +
> + kmem_cache_destroy(ivpcd->store_block_cache);
> + vfio_pci_core_release_dev(core_vdev);
As discussed offline. I think it may be a bug in the Xe variant driver
that their xe_vfio_pci_release_dev() doesn't call
vfio_pci_core_release_dev() even though xe_vfio_pci_init_dev() calls
vfio_pci_core_init_dev(). @Alex, @Michal?
> +}
> +
--- snip ---
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("vfio-pci variant driver for the IBM Internal Shared Memory (ISM) device");
> +MODULE_AUTHOR("IBM Corporation");
Apart from the code style nits this looks good to me now. Sorry for
sending us on a few detours.
Feel free to add my:
Reviewed-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>
Thanks,
Niklas