Re: [PATCH v14 net-next 02/11] net/nebula-matrix: add our driver architecture

From: Jakub Kicinski

Date: Fri May 15 2026 - 21:17:37 EST


This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/nebula-matrix: add our driver architecture

This commit introduces the baseline driver architecture for the
nebula-matrix networking device. It establishes the Hardware, Channel,
Resource, Dispatch, and Device layers for device management.

> diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_leonis/nbl_hw_leonis.c b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_leonis/nbl_hw_leonis.c
> index 0000000000000..1d673b7757655
> --- /dev/null
> +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_leonis/nbl_hw_leonis.c

[ ... ]

> @@ -46,104 +46,149 @@
> +int nbl_hw_init_leonis(struct nbl_adapter *adapter)
> +{

[ ... ]

> + if (common->has_ctrl) {
> + bar_len = pci_resource_len(pdev, NBL_MEMORY_BAR);
> + bar_start = pci_resource_start(pdev, NBL_MEMORY_BAR);
> + if (!(pci_resource_flags(pdev, NBL_MEMORY_BAR) &
> + IORESOURCE_MEM) ||
> + bar_len < NBL_RDMA_NOTIFY_OFF) {

If bar_len is exactly equal to NBL_RDMA_NOTIFY_OFF (8192 bytes), this check
evaluates to false and passes.

> + dev_err(&pdev->dev, "Invalid BAR: unassigned or length too small\n");
> + ret = -EINVAL;
> + goto ioremap_err;
> + }
> + hw_mgt->hw_addr =
> + ioremap(bar_start,
> + bar_len - NBL_RDMA_NOTIFY_OFF);

Which then leads to ioremap(bar_start, 0) here. Is mapping 0 bytes valid, or
could it lead to out-of-bounds accesses? Should the condition above be
bar_len <= NBL_RDMA_NOTIFY_OFF?

> + if (!hw_mgt->hw_addr) {
> + dev_err(&pdev->dev, "Memory bar ioremap failed\n");
> + ret = -EIO;
> + goto ioremap_err;
> + }
> + hw_mgt->hw_size = bar_len - NBL_RDMA_NOTIFY_OFF;
> + } else {
> + hw_mgt->hw_addr =
> + ioremap(pci_resource_start(pdev, NBL_MEMORY_BAR),
> + NBL_RDMA_NOTIFY_OFF);

Are we missing a validation check here? Unlike the has_ctrl branch, this
unconditionally maps NBL_MEMORY_BAR without checking if the BAR has the
IORESOURCE_MEM flag or if its length is at least NBL_RDMA_NOTIFY_OFF. Could
this cause issues if the device presents an unassigned or I/O port BAR?

> diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_reg.h b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_reg.h
> index 0000000000000..2921f70890722
> --- /dev/null
> +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_reg.h

[ ... ]

> +
> +#define NBL_MEMORY_BAR 0
> +#define NBL_MAILBOX_BAR 2
> +#define NBL_RDMA_NOTIFY_OFF 8192
> +#define NBL_HW_DUMMY_REG 0x1300904

[ ... ]

> +static inline void nbl_flush_writes(struct nbl_hw_mgt *hw_mgt)
> +{
> + nbl_hw_rd32(hw_mgt, NBL_HW_DUMMY_REG);
> +}

Can this perform an out-of-bounds memory read? NBL_HW_DUMMY_REG is defined as
0x1300904 (~19MB), but when has_ctrl is false, the driver explicitly maps only
NBL_RDMA_NOTIFY_OFF (8192 bytes) for hw_mgt->hw_addr. Should the driver ensure
the offset is bounded by hw_mgt->hw_size?

> diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_main.c b/drivers/net/ethernet/nebula-matrix/nbl/nbl_main.c
> index 10c3536b327b2..56131a50d59bd 100644
> --- a/drivers/net/ethernet/nebula-matrix/nbl/nbl_main.c
> +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_main.c

[ ... ]

> +struct nbl_adapter *nbl_core_init(struct pci_dev *pdev,
> + struct nbl_init_param *param)
> +{

[ ... ]

> + ret = product_base_ops->hw_init(adapter);
> + if (ret)
> + goto hw_init_fail;
> +
> + ret = product_base_ops->chan_init(adapter);
> + if (ret)
> + goto chan_init_fail;
> +
> + ret = product_base_ops->res_init(adapter);
> + if (ret)
> + goto res_init_fail;
> +
> + ret = nbl_disp_init(adapter);
> + if (ret)
> + goto disp_init_fail;
> +
> + ret = nbl_dev_init(adapter);
> + if (ret)
> + goto dev_init_fail;
> + return adapter;
> +
> +dev_init_fail:
> + nbl_disp_remove(adapter);
> +disp_init_fail:
> + product_base_ops->res_remove(adapter);
> +res_init_fail:
> + product_base_ops->chan_remove(adapter);
> +chan_init_fail:
> + product_base_ops->hw_remove(adapter);
> +hw_init_fail:
> + return NULL;
> +}

Does this silently discard specific initialization error codes? When
sub-initialization routines fail, they return informative error codes like
-ENOMEM or -EINVAL, but this function catches them and simply returns NULL.

[ ... ]

> +static void nbl_get_func_param(struct pci_dev *pdev, kernel_ulong_t driver_data,
> + struct nbl_init_param *param)
> +{

[ ... ]

> + /*
> + * Leonis only PF0 has ctrl capability, but PF0's pcie device_id
> + * is same with other PF.So handle it special.
> + */
> + if (param->product_type == NBL_LEONIS_TYPE &&
> + (PCI_FUNC(pdev->devfn) == 0))
> + param->caps.has_ctrl = 1;
> +}

Could this incorrectly grant VFs management privileges? In SR-IOV environments
where VFs are assigned to new virtual buses (such as with Alternative Routing-ID
Interpretation), VFs can also have a function number of 0. Should the driver
verify that the device is actually a Physical Function by checking
!pdev->is_virtfn?

[ ... ]

> +static int nbl_probe(struct pci_dev *pdev,
> + const struct pci_device_id *id)
> +{

[ ... ]

> + adapter = nbl_core_init(pdev, &param);
> + if (!adapter) {
> + dev_err(dev, "Nbl adapter init fail\n");
> + err = -ENODEV;
> + goto adapter_init_err;
> + }

If nbl_core_init() returned an ERR_PTR(ret) instead of NULL, we wouldn't have
to unconditionally override the true root cause of probe failures with -ENODEV,
which could make driver debugging easier.
--
pw-bot: cr