Re: [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA
From: Herve Codina
Date: Mon Mar 23 2026 - 11:13:55 EST
Hi Daniel,
On Fri, 20 Mar 2026 16:00:56 +0100
Daniel Machon <daniel.machon@xxxxxxxxxxxxx> wrote:
> When lan966x operates as a PCIe endpoint, the driver currently uses
> register-based I/O for frame injection and extraction. This approach is
> functional but slow, topping out at around 33 Mbps on an Intel x86 host
> with a lan966x PCIe card.
>
> This series adds FDMA (Frame DMA) support for the PCIe path. When
> operating as a PCIe endpoint, the internal FDMA engine on lan966x cannot
> directly access host memory, so DMA buffers are allocated as contiguous
> coherent memory and mapped through the PCIe Address Translation Unit
> (ATU). The ATU provides outbound windows that translate internal FDMA
> addresses to PCIe bus addresses, allowing the FDMA engine to read and
> write host memory. Because the ATU requires contiguous address regions,
> page_pool and normal per-page DMA mappings cannot be used. Instead,
> frames are transferred using memcpy between the ATU-mapped buffers and
> the network stack. With this, throughput increases from ~33 Mbps to ~620
> Mbps for default MTU.
>
> Patches 1-2 prepare the shared FDMA library: patch 1 renames the
> contiguous dataptr helpers for clarity, and patch 2 adds PCIe ATU region
> management and coherent DMA allocation with ATU mapping.
>
> Patches 3-5 refactor the lan966x FDMA code to support both platform and
> PCIe paths: extracting the LLP register write into a helper, exporting
> shared functions, and introducing an ops dispatch table selected at
> probe time.
>
> Patch 6 adds the core PCIe FDMA implementation with RX/TX using
> contiguous ATU-mapped buffers. Patches 7 and 8 extend it with MTU
> change and XDP support respectively.
>
> Patches 9-10 update the lan966x PCI device tree overlay to extend the
> cpu register mapping to cover the ATU register space and add the FDMA
> interrupt.
>
Thanks a lot for the series taking care of DMA and ATU in PCIe variants.
I have tested the whole series on both my ARM and x86 systems.
Doing a simple wget on my x86 system, I moved from 3.8MB/s to 11.2MB/s and
so the improvement is obvious.
Tested-by: Herve Codina <herve.codina@xxxxxxxxxxx>
Best regards,
Hervé