Re: [PATCH v3 0/5] dma-mapping: arm64: support batched cache sync

From: Marek Szyprowski

Date: Mon Mar 16 2026 - 03:24:32 EST

On 13.03.2026 20:36, Catalin Marinas wrote:
> On Tue, Mar 03, 2026 at 05:33:37PM +0100, Marek Szyprowski wrote:
>> On 28.02.2026 23:11, Barry Song wrote:
>>> From: Barry Song <baohua@xxxxxxxxxx>
>>>
>>> Many embedded ARM64 SoCs still lack hardware cache coherency support, which
>>> causes DMA mapping operations to appear as hotspots in on-CPU flame graphs.
>>>
>>> For an SG list with *nents* entries, the current dma_map/unmap_sg() and DMA
>>> sync APIs perform cache maintenance one entry at a time. After each entry,
>>> the implementation synchronously waits for the corresponding region’s
>>> D-cache operations to complete. On architectures like arm64, efficiency can
>>> be improved by issuing all entries’ operations first and then performing a
>>> single batched wait for completion.
>>>
>>> Tangquan's results show that batched synchronization can reduce
>>> dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
>>> phone platform (MediaTek Dimensity 9500). The tests were performed by
>>> pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
>>> running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
>>> sg entries per buffer) for 200 iterations and then averaging the
>>> results.
>>>
>>> Thanks to Xueyuan for volunteering to take on the testing tasks. He
>>> put significant effort into validating paths such as IOVA link/unlink
>>> and SWIOTLB on RK3588 boards with NVMe.
>> Catalin, Will, I would like to merge this to dma-mapping tree, give Your
>> ack or comment if You are okay with ARM64 related parts.
> Sorry for the delay. Yes, feel free to pick them up. I doubt there would
> be any conflicts in this area with what I'm merging through the arm64
> tree.

Thanks, applied to dma-mapping-for-next.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland