Re: [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure

From: Alexandre Courbot

Date: Tue Jun 02 2026 - 21:38:32 EST


On Tue Jun 2, 2026 at 9:21 PM JST, Eliot Courtney wrote:
> On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
>> FSP communication uses a pair of non-circular queues in the FSP
>> falcon's EMEM, one for messages from the driver to FSP and one for
>> replies, with the driver polling for response data. Add the queue
>> registers and the low-level helpers used by the higher-level FSP
>> message layer.
>>
>> Signed-off-by: John Hubbard <jhubbard@xxxxxxxxxx>
>> ---
>> drivers/gpu/nova-core/falcon/fsp.rs | 61 ++++++++++++++++++++++++++++-
>> drivers/gpu/nova-core/regs.rs | 21 ++++++++++
>> 2 files changed, 80 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
>> index 6b057d958115..0ec1c55213bc 100644
>> --- a/drivers/gpu/nova-core/falcon/fsp.rs
>> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
>> @@ -112,7 +112,6 @@ impl Falcon<Fsp> {
>> ///
>> /// `data` is interpreted as little-endian 32-bit words. Returns `EINVAL`
>> /// if `offset` or the `data` length is not 4-byte aligned.
>> - #[expect(dead_code)]
>> fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
>> if offset % 4 != 0 || data.len() % 4 != 0 {
>> return Err(EINVAL);
>> @@ -131,7 +130,6 @@ fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
>> ///
>> /// `data` is stored as little-endian 32-bit words. Returns `EINVAL` if
>> /// `offset` or the `data` length is not 4-byte aligned.
>> - #[expect(dead_code)]
>> fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>> if offset % 4 != 0 || data.len() % 4 != 0 {
>> return Err(EINVAL);
>> @@ -145,4 +143,63 @@ fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>>
>> Ok(())
>> }
>> +
>> + /// Poll FSP for incoming data.
>> + ///
>> + /// Returns the size of available data in bytes, or 0 if no data is available.
>> + ///
>> + /// The FSP message queue is not circular. Pointers are reset to 0 after each
>> + /// message exchange, so `tail >= head` is always true when data is present.
>> + #[expect(dead_code)]
>> + pub(crate) fn poll_msgq(&self, bar: &Bar0) -> u32 {
>> + let head = bar.read(regs::NV_PFSP_MSGQ_HEAD).address();
>> + let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL).address();
>> +
>> + if head == tail {
>> + return 0;
>> + }
>> +
>> + // TAIL points at last DWORD written, so add 4 to get total size
>> + tail.saturating_sub(head) + 4
>> + }
>
> In a later patch, `send_sync_fsp` polls this then calls `recv_msg`. But,
> structurally it's possible to pass in any size to `recv_msg` and read
> more than we are supposed to. What about having `recv_msg` do the
> polling to get the size and return a KVec with the read out data,
> instead of `send_sync_fsp`? `poll_msgq` could stay private and we can
> make it public later if we need to.

The issue I see with returning a `KVec` is that it imposes a dynamic
allocation for every message. Granted, this is what the current code
does, but now that we have this `&mut self` logic in place that
guarantees exclusive access, we can also turn the receiving `KVec` into
a member of `Fsp` and keep passing it as a mut reference to avoid that.

>
>> +
>> + /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
>> + ///
>> + /// Returns `EINVAL` if `packet` is empty or its length is not 4-byte aligned.
>> + #[expect(dead_code)]
>> + pub(crate) fn send_msg(&mut self, bar: &Bar0, packet: &[u8]) -> Result {
>> + if packet.is_empty() {
>> + return Err(EINVAL);
>> + }
>> +
>> + // Write message to EMEM at offset 0 (validates 4-byte alignment)
>> + self.write_emem(bar, 0, packet)?;
>> +
>> + // Update queue pointers. TAIL points at the last DWORD written.
>> + let tail_offset = u32::try_from(packet.len() - 4).map_err(|_| EINVAL)?;
>> + bar.write_reg(regs::NV_PFSP_QUEUE_TAIL::zeroed().with_address(tail_offset));
>> + bar.write_reg(regs::NV_PFSP_QUEUE_HEAD::zeroed().with_address(0));
>> +
>> + Ok(())
>> + }
>> +
>> + /// Reads `size` bytes from FSP EMEM into `buffer` and resets the queue pointers.
>> + ///
>> + /// `size` comes from `poll_msgq`. Returns `EINVAL` if `size` is 0, exceeds
>> + /// `buffer`, or is not 4-byte aligned.
>> + #[expect(dead_code)]
>> + pub(crate) fn recv_msg(&mut self, bar: &Bar0, buffer: &mut [u8], size: usize) -> Result {
>> + if size == 0 || size > buffer.len() {
>> + return Err(EINVAL);
>> + }
>> +
>> + // Read response from EMEM at offset 0 (validates 4-byte alignment)
>> + self.read_emem(bar, 0, &mut buffer[..size])?;
>> +
>> + // Reset message queue pointers after reading
>> + bar.write_reg(regs::NV_PFSP_MSGQ_TAIL::zeroed().with_address(0));
>> + bar.write_reg(regs::NV_PFSP_MSGQ_HEAD::zeroed().with_address(0));
>> +
>> + Ok(())
>> + }
>
> I think we can remove the `size` argument and have the caller pass in
> an appropriately sized slice (altho obviated by my other comment).

Agreed, having both a slice and a length parameter is redundant and
requires extra checks that shouldn't be necessary. `recv_msg` is also
called right after we resized the receiving vector to the right size, so
we currently do have a call-time guarantee that `size == buffer.len()`.