Re: [PATCH v12 17/22] gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging

From: sashiko-bot

Date: Mon Jun 01 2026 - 23:37:27 EST

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Unbounded kernel memory allocation based on untrusted hardware input.
- [High] FSP message synchronization ignores `task_id` and mishandles timeouts, leading to permanent message desynchronization.
- [Medium] FSP protocol structures use native integer types and are directly transmuted from byte streams, breaking big-endian compatibility.
--

commit 4be2156ade2cecdc1c8a1f092241ae18c134074e
Author: John Hubbard <jhubbard@xxxxxxxxxx>

gpu: nova-core: Hopper/Blackwell: add FSP send/receive messaging

This commit introduces the synchronous send-and-wait path for FSP messaging
in the nova-core driver. It defines FSP protocol structures and a
send_sync_fsp method to exchange messages with the FSP hardware.

> diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
> index f3524137d9f78..67cf83aba83c6 100644
> --- a/drivers/gpu/nova-core/fsp.rs
> +++ b/drivers/gpu/nova-core/fsp.rs

[ ... ]

> @@ -22,18 +26,56 @@

[ ... ]

> +/// FSP command response payload (NVDM_PAYLOAD_COMMAND_RESPONSE).
> +#[repr(C, packed)]
> +#[derive(Clone, Copy)]
> +struct NvdmPayloadCommandResponse {
> + task_id: u32,
> + command_nvdm_type: u32,
> + error_code: u32,
> +}

[Severity: Medium]
Will this structure parse values incorrectly on big-endian architectures?

The FSP hardware writes responses in a fixed endianness (little-endian), but
this uses native u32 types. When FspResponse::from_bytes_prefix transmutes
the byte stream, it could result in corrupted values and spurious validation
failures like command_nvdm_type != M::NVDM_TYPE on big-endian systems.

Should these use explicit little-endian types instead?

[ ... ]

> @@ -70,4 +112,79 @@ pub(crate) fn wait_secure_boot(

[ ... ]

> + fn send_sync_fsp<M>(&mut self, dev: &device::Device, bar: &Bar0, msg: &M) -> Result
> + where
> + M: MessageToFsp,
> + {

[ ... ]

> + let packet_size = num::u32_as_usize(packet_size);
> + let mut response_buf = KVec::<u8>::new();
> + response_buf.resize(packet_size, 0, GFP_KERNEL)?;

[Severity: High]
Could this trigger an unbounded kernel memory allocation?

The packet_size is derived from poll_msgq(), which calculates the size
based on MMIO reads of the FSP queue head and tail pointers. If an errant
or compromised FSP firmware returns a large value, passing this unvalidated
size directly to resize() with GFP_KERNEL could trigger the OOM killer or
cause severe system stutter.

Should this size be clamped to a hardware-defined maximum, such as the
FSP EMEM capacity, before allocation?

[ ... ]

> + if command_nvdm_type != M::NVDM_TYPE {
> + dev_err!(
> + dev,
> + "Expected NVDM type {:#x} in reply, got {:#x}\n",
> + M::NVDM_TYPE,
> + command_nvdm_type
> + );
> + return Err(EIO);
> + }

[Severity: High]
Does this validation logic leave the driver vulnerable to message
desynchronization after a timeout?

If send_sync_fsp times out waiting for a response, it returns ETIMEDOUT
without draining the message queue. If the FSP firmware completes the
request later, its response remains at the queue's tail.

A subsequent call to send_sync_fsp will immediately read this delayed
response. Because this validation only checks command_nvdm_type and
ignores the task_id (sequence number), the driver might incorrectly accept
the stale response as the reply to the new request if they share the same
NVDM type.

Should the validation also check task_id against the sent request to
reject stale responses?

--
Sashiko AI review · https://sashiko.dev/#/patchset/20260602032111.224790-1-jhubbard@xxxxxxxxxx?part=17