Re: [PATCH v6 2/3] vfio/ism: Implement vfio_pci driver for ISM devices
From: Niklas Schnelle
Date: Fri Mar 20 2026 - 10:04:26 EST
On Fri, 2026-03-20 at 14:37 +0100, Julian Ruess wrote:
> On Fri Mar 20, 2026 at 1:54 PM CET, Niklas Schnelle wrote:
> > On Thu, 2026-03-19 at 16:42 +0100, Julian Ruess wrote:
> > > Add a vfio_pci variant driver for the s390-specific Internal Shared
> > > Memory (ISM) devices used for inter-VM communication.
> > >
> > > This enables the development of vfio-pci-based user space drivers for
> > > ISM devices.
> > >
> > > On s390, kernel primitives such as ioread() and iowrite() are switched
> > > over from function handle based PCI load/stores instructions to PCI
> > > memory-I/O (MIO) loads/stores when these are available and not
> > > explicitly disabled. Since these instructions cannot be used with ISM
> > > devices, ensure that classic function handle-based PCI instructions are
> > > used instead.
> > >
> > > The driver is still required even when MIO instructions are disabled, as
> > > the ISM device relies on the PCI store block (PCISTB) instruction to
> > > perform write operations.
> > >
> > > Stores are not fragmented, therefore one ioctl corresponds to exactly
> > > one PCISTB instruction. User space must ensure to not write more than
> > > 4096 bytes at once to an ISM BAR which is the maximum payload of the
> > > PCISTB instruction.
> > >
> > > Signed-off-by: Julian Ruess <julianr@xxxxxxxxxxxxx>
> > > ---
> > > drivers/vfio/pci/Kconfig | 2 +
> > > drivers/vfio/pci/Makefile | 2 +
> > > drivers/vfio/pci/ism/Kconfig | 10 ++
> > > drivers/vfio/pci/ism/Makefile | 3 +
> > > drivers/vfio/pci/ism/main.c | 400 ++++++++++++++++++++++++++++++++++++++++++
> > > 5 files changed, 417 insertions(+)
> > >
> >
> > As a general note, ISM devices do not support any low power/sleep modes
> > which is why there is no PM handling done.
> >
> > > +
> > > +/*
> > > + * ism_vfio_pci_do_io_w()
> > > + *
> > > + * Ensure that the PCI store block (PCISTB) instruction is used as required by the
> > > + * ISM device. The ISM device also uses a 256 TiB BAR 0 for write operations,
> > > + * which requires a 48bit region address space (ISM_VFIO_PCI_OFFSET_SHIFT).
> > > + */
> > > +static ssize_t ism_vfio_pci_do_io_w(struct vfio_pci_core_device *vdev,
> > > + char __user *buf, loff_t off, size_t count,
> > > + int bar)
> > > +{
> > > + struct zpci_dev *zdev = to_zpci(vdev->pdev);
> > > + struct ism_vfio_pci_core_device *ivpcd;
> > > + ssize_t ret;
> > > + void *data;
> > > + u64 req;
> > > +
> > > + if (count > zdev->maxstbl)
> > > + return -EINVAL;
> > > + if (((off % PAGE_SIZE) + count) > PAGE_SIZE)
> > > + return -EINVAL;
> > > +
> > > + ivpcd = container_of(vdev, struct ism_vfio_pci_core_device,
> > > + core_device);
> > > + data = kmem_cache_alloc(ivpcd->store_block_cache, GFP_KERNEL);
> > > + if (!data)
> > > + return -ENOMEM;
> > > +
> > > + if (copy_from_user(data, buf, count)) {
> > > + ret = -EFAULT;
> > > + goto out_free;
> > > + }
> > > +
> > > + req = ZPCI_CREATE_REQ(READ_ONCE(zdev->fh), bar, count);
> > > + ret = __zpci_store_block(data, req, off);
> >
> > Note: There is a Sashiko finding that PCI Store Block needs 8 byte
> > alignment and we don't ensure that off is 8 byte aligned here. While
> > generally true the ISM device has relaxed alignment rules and this
> > requirement does not apply. That said we should set PAGE_SIZE alignment
> > on the kmem_cache such that data is guaranteed PAGE_SIZE aligned and
> > the page crossing check works correctly.
> >
> > > + if (ret)
> > > + goto out_free;
> > > +
> > > + ret = count;
> > > +
> > > +out_free:
> > > + kmem_cache_free(ivpcd->store_block_cache, data);
> > > + return ret;
> > > +}
> > > +
> > --- snip ---
> >
> > > +
> > > +static int ism_vfio_pci_init_dev(struct vfio_device *core_vdev)
> > > +{
> > > + struct zpci_dev *zdev = to_zpci(to_pci_dev(core_vdev->dev));
> > > + struct ism_vfio_pci_core_device *ivpcd;
> > > + char cache_name[20];
> > > + int ret;
> > > +
> > > + ivpcd = container_of(core_vdev, struct ism_vfio_pci_core_device,
> > > + core_device.vdev);
> > > +
> > > + snprintf(cache_name, sizeof(cache_name), "ism_sb_fid_%08x", zdev->fid);
> > > + ivpcd->store_block_cache =
> > > + kmem_cache_create(cache_name, zdev->maxstbl, 0, 0, NULL);
> >
> > Sashiko notes here that the cache_name is stack allocated and says that
> > kmem_cache_create() doesn't copy it. Looking at
> > __kmem_cache_create_args() however this seems to be not true as that
> > does krstrdup_const() on the passed-in name.
>
> I think we should change this to:
>
> ivpcd->store_block_cache = kmem_cache_create(
> cache_name, zdev->maxstbl, PAGE_SIZE,
> (SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT | SLAB_PANIC), NULL);
>
> The alignment ensures that we do not cross the integral boundary, and as
> Sashiko said, memcg accounting is also useful here.
We definitely do *not* want to panic when kmem_cache_create() fails,
that would take down the whole kernel just for a PCI device! The others
make sense though I'm not sure if SLAB_RECLAIM_ACCOUNT needs other
setup.
Thanks,
Niklas