Re: [PATCH net-next 00/10] net: lan966x: add support for PCIe FDMA
From: Herve Codina
Date: Mon Mar 23 2026 - 13:47:53 EST
Hi Daniel,
On Mon, 23 Mar 2026 15:52:04 +0100
Herve Codina <herve.codina@xxxxxxxxxxx> wrote:
> Hi Daniel,
>
> On Fri, 20 Mar 2026 16:00:56 +0100
> Daniel Machon <daniel.machon@xxxxxxxxxxxxx> wrote:
>
> > When lan966x operates as a PCIe endpoint, the driver currently uses
> > register-based I/O for frame injection and extraction. This approach is
> > functional but slow, topping out at around 33 Mbps on an Intel x86 host
> > with a lan966x PCIe card.
> >
> > This series adds FDMA (Frame DMA) support for the PCIe path. When
> > operating as a PCIe endpoint, the internal FDMA engine on lan966x cannot
> > directly access host memory, so DMA buffers are allocated as contiguous
> > coherent memory and mapped through the PCIe Address Translation Unit
> > (ATU). The ATU provides outbound windows that translate internal FDMA
> > addresses to PCIe bus addresses, allowing the FDMA engine to read and
> > write host memory. Because the ATU requires contiguous address regions,
> > page_pool and normal per-page DMA mappings cannot be used. Instead,
> > frames are transferred using memcpy between the ATU-mapped buffers and
> > the network stack. With this, throughput increases from ~33 Mbps to ~620
> > Mbps for default MTU.
> >
> > Patches 1-2 prepare the shared FDMA library: patch 1 renames the
> > contiguous dataptr helpers for clarity, and patch 2 adds PCIe ATU region
> > management and coherent DMA allocation with ATU mapping.
> >
> > Patches 3-5 refactor the lan966x FDMA code to support both platform and
> > PCIe paths: extracting the LLP register write into a helper, exporting
> > shared functions, and introducing an ops dispatch table selected at
> > probe time.
> >
> > Patch 6 adds the core PCIe FDMA implementation with RX/TX using
> > contiguous ATU-mapped buffers. Patches 7 and 8 extend it with MTU
> > change and XDP support respectively.
> >
> > Patches 9-10 update the lan966x PCI device tree overlay to extend the
> > cpu register mapping to cover the ATU register space and add the FDMA
> > interrupt.
> >
>
> Thanks a lot for the series taking care of DMA and ATU in PCIe variants.
>
> I have tested the whole series on both my ARM and x86 systems.
>
> Doing a simple wget on my x86 system, I moved from 3.8MB/s to 11.2MB/s and
> so the improvement is obvious.
>
> Tested-by: Herve Codina <herve.codina@xxxxxxxxxxx>
>
Hum, I think I found an issue.
If I remove the lan966x_pci module (modprobe -r lan966x_pci), and reload
it (modprobe lan966x_pci), the board is not working.
The system performs DHCP requests. Those requests are served by my PC (observed
with Wireshark) but the system doesn't see those answers. Indeed, he continues
to perform DHCP requests.
Looks like the lan966x_pci module removal leaves the board in a bad state.
Without the series applied, DHCP request answers from my PC are seen by the
system after any module unloading / reloading.
Do you have any ideas of what could be wrong?
Best regards,
Hervé