Re: [PATCH] dma-debug: skip cacheline overlap tracking on cache-coherent architectures

From: David Laight

Date: Tue May 19 2026 - 05:36:15 EST


On Mon, 18 May 2026 17:23:15 +0500
Mikhail Gavrilov <mikhail.v.gavrilov@xxxxxxxxx> wrote:

> On Mon, May 18, 2026 at 5:10 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> >
> > I would say this reproducer is incorrect. From what I recall, the only two
> > legitimate use cases for cacheline overlap are virtio and RDMA.
>
> The wild trace in the commit message is NVMe block I/O -- neither virtio
> nor RDMA:
>
> add_dma_entry -> debug_dma_map_phys -> dma_map_phys ->
> blk_dma_map_iter_start -> nvme_map_data
>
> The block layer submits many concurrent in-flight requests; small
> kmalloc'd buffers naturally land in the same cacheline under high IOPS,

Isn't there a flag to kmalloc() that indicates the buffers will be used
for dma and mustn't share a cache line with anything else writable.
(Which means the size must be rounded up to a multiple of the cache
line size.)
For DMA_FROM_DEVICE it is important that the cpu doesn't dirty the cache
lines.

This is probably worse on systems with 256 byte cache lines.

-- David

> which is incidental rather than intentional overlap. Ming Lei's report
> linked in the commit message [1] enumerates additional non-virtio /
> non-RDMA cases hitting the same WARN: liburing iopoll tests, raid1,
> dm-thin and other storage utilities.
>
> > The first intentionally relies on it for small allocations, and the second exports the
> > cachelines to the user space and cannot operate on non‑coherent architectures.
>
> The reproducer isn't claiming to be either of those. It deterministically
> reaches the same state-based gate the wild NVMe trace hits
> (!is_cache_clean && overlap > 7, with direction != DMA_TO_DEVICE, after
> the v2 coherent-arch / SWIOTLB-bounce suppressions are evaluated). Since
> that gate has no subsystem-specific term, any caller -- synthetic or real
> -- reaching it with those state values triggers the same WARN.
>
> If the broader concern is that the block layer should opt into your
> coherency-attribute work rather than relying on debug-side suppression,
> that's a reasonable longer-term direction. But it's additive: even with
> opt-in adoption, the WARN remains a false positive on coherent arches
> for callers that don't annotate -- which is exactly what v2 (3d48c9fd78dd)
> already established for the sibling "cacheline tracking EEXIST" err_printk.
>
> [1] https://lore.kernel.org/all/ZwxzdWmYcBK27mUs@fedora/
>