[net-next v6 00/12] Add TSO map-once DMA helpers and bnxt SW USO support

From: Joe Damato

Date: Thu Mar 26 2026 - 19:54:18 EST


Greetings:

This series extends net/tso to add a data structure and some helpers allowing
drivers to DMA map headers and packet payloads a single time. The helpers can
then be used to reference slices of shared mapping for each segment. This
helps to avoid the cost of repeated DMA mappings, especially on systems which
use an IOMMU. N per-packet DMA maps are replaced with a single map for the
entire GSO skb. As of v3, the series uses the DMA IOVA API (as suggested by
Leon [1]) and provides a fallback path when an IOMMU is not in use. The DMA
IOVA API provides even better efficiency than the v2; see below.

The added helpers are then used in bnxt to add support for software UDP
Segmentation Offloading (SW USO) for older bnxt devices which do not have
support for USO in hardware. Since the helpers are generic, other drivers
can be extended similarly.

The v2 showed a ~4x reduction in DMA mapping calls at the same wire packet
rate on production traffic with a bnxt device. The v3, however, shows a larger
reduction of about ~6x at the same wire packet rate. This is thanks to Leon's
suggestion of using the DMA IOVA API [1].

Special care is taken to make bnxt ethtool operations work correctly: the ring
size cannot be reduced below a minimum threshold while USO is enabled and
growing the ring automatically re-enables USO if it was previously blocked.

I've extended netdevsim to have support for SW USO, but I used
tso_build_hdr/tso_build_data in netdevsim because I couldn't figure out if
there was a way to test the DMA helpers added by this series. If anyone has
suggestions, let me know. I think to test the DMA helpers you probably need
to use real hardware.

The v6 includes a minor change in the USO implementation, as requested by
Paolo [2], so I've re-run the test with both netdevsim and real bnxt hardware
and the test passed. I also ran the kernel on a production machine with real
traffic.

Thanks,
Joe

[1]: https://lore.kernel.org/netdev/20260316194419.GH61385@unreal/
[2]: https://lore.kernel.org/netdev/ab1f764b-de03-48f5-a781-356495257d25@xxxxxxxxxx/

v6:
- Addressed Paolo's request [2] to avoid possible stale iova_state if the
IOVA API starts to fail transiently. See patch 8.

v5: https://lore.kernel.org/netdev/20260323183844.3146982-1-joe@xxxxxxx/
- Adjusted patch 8 to address the kernel test robot. See patch changelog, no
functional change.
- Added Pavan's Reviewed-by to patches 6-12.

v4: https://lore.kernel.org/all/20260320144141.260246-1-joe@xxxxxxx/
- Fixed kdoc issues in patch 2. No functional change.
- Added Pavan's Reviewed-by to patches 3, 4, and 5.
- Fixed the issue Pavan (and the AI review) pointed out in patch 8. See
patch changelog.
- Added parentheses around gso_type check in patch 11 for clarity. No
functional change.
- Fixed python linter issues in patch 12. No functional change.

v3: https://lore.kernel.org/netdev/20260318191325.1819881-1-joe@xxxxxxx/
- Converted from RFC to an actual submission.
- Updated based on Leon's feedback to use the DMA IOVA API. See individual
patches for update information.

RFCv2: https://lore.kernel.org/netdev/20260312223457.1999489-1-joe@xxxxxxx/
- Some bugs were discovered shortly after sending: incorrect handling of the
shared header space and a bug in the unmap path in the TX completion.
Sorry about that; I was more careful this time.
- On that note: this rfc includes a test.

RFCv1: https://lore.kernel.org/netdev/20260310212209.2263939-1-joe@xxxxxxx/


Joe Damato (12):
net: tso: Introduce tso_dma_map
net: tso: Add tso_dma_map helpers
net: bnxt: Export bnxt_xmit_get_cfa_action
net: bnxt: Add a helper for tx_bd_ext
net: bnxt: Use dma_unmap_len for TX completion unmapping
net: bnxt: Add TX inline buffer infrastructure
net: bnxt: Add boilerplate GSO code
net: bnxt: Implement software USO
net: bnxt: Add SW GSO completion and teardown support
net: bnxt: Dispatch to SW USO
net: netdevsim: Add support for SW USO
selftests: drv-net: Add USO test

drivers/net/ethernet/broadcom/bnxt/Makefile | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 190 +++++++++---
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 33 +++
.../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 19 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c | 240 +++++++++++++++
drivers/net/ethernet/broadcom/bnxt/bnxt_gso.h | 31 ++
drivers/net/netdevsim/netdev.c | 100 ++++++-
include/linux/skbuff.h | 11 +
include/net/tso.h | 61 ++++
net/core/tso.c | 273 ++++++++++++++++++
tools/testing/selftests/drivers/net/Makefile | 1 +
tools/testing/selftests/drivers/net/uso.py | 96 ++++++
12 files changed, 1017 insertions(+), 40 deletions(-)
create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_gso.h
create mode 100755 tools/testing/selftests/drivers/net/uso.py


base-commit: 45b2b84ac6fde39c427018d6cdf7d44258938faa
--
2.52.0