Re: [PATCH 3/3] gpu: nova-core: fix wrong use of barriers in GSP code

From: Eliot Courtney

Date: Mon Apr 13 2026 - 01:34:06 EST

On Fri Apr 3, 2026 at 12:24 AM JST, Gary Guo wrote:
> From: Gary Guo <gary@xxxxxxxxxxx>
>
> Currently, in the GSP->CPU messaging path, the current code misses a read
> barrier before data read. The barrier after read is updated to a DMA
> barrier (with release ordering desired), instead of the existing (Rust)
> SeqCst SMP barrier; the location of barrier is also moved to the beginning
> of function, because the barrier is needed to synchronizing between data
> and ring-buffer pointer, the RMW operation does not internally need a
> barrier (nor it has to be atomic, as CPU pointers are updated by CPU only).
>
> In the CPU->GSP messaging path, the current code misses a write barrier
> after data write and before updating the CPU write pointer. Barrier is not
> needed before data write due to control dependency, this fact is documented
> explicitly. This could be replaced with an acquire barrier if needed.
>
> Signed-off-by: Gary Guo <gary@xxxxxxxxxxx>

nit: should this have
Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")

?

> ---
> drivers/gpu/nova-core/gsp/cmdq.rs | 19 +++++++++++++++++++
> drivers/gpu/nova-core/gsp/fw.rs | 12 ------------
> 2 files changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 2224896ccc89..7e4315b13984 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -19,6 +19,12 @@
> prelude::*,
> sync::{
> aref::ARef,
> + barrier::{
> + dma_mb,
> + Read,
> + Release,
> + Write, //
> + },
> Mutex, //
> },
> time::Delta,
> @@ -258,6 +264,9 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
> let tx = self.cpu_write_ptr() as usize;
> let rx = self.gsp_read_ptr() as usize;
>
> + // ORDERING: control dependency provides necessary LOAD->STORE ordering.
> + // `dma_mb(Acquire)` may be used here if we don't want to rely on control dependency.
> +
> // SAFETY:
> // - We will only access the driver-owned part of the shared memory.
> // - Per the safety statement of the function, no concurrent access will be performed.
> @@ -311,6 +320,9 @@ fn driver_write_area_size(&self) -> usize {
> let tx = self.gsp_write_ptr() as usize;
> let rx = self.cpu_read_ptr() as usize;
>
> + // ORDERING: Ensure data load is ordered after load of GSP write pointer.
> + dma_mb(Read);
> +
> // SAFETY:
> // - We will only access the driver-owned part of the shared memory.
> // - Per the safety statement of the function, no concurrent access will be performed.
> @@ -408,6 +420,10 @@ fn cpu_read_ptr(&self) -> u32 {
>
> // Informs the GSP that it can send `elem_count` new pages into the message queue.
> fn advance_cpu_read_ptr(&mut self, elem_count: u32) {
> + // ORDERING: Ensure read pointer is properly ordered.

What about a more specific comment that describes exactly what is
ordered, e.g. something like:
Ensure all reads of message data by the CPU have completed before writing
the updated read pointer to the GSP, since it may overwrite that data.

Maybe this is just me but it's a lot easier for me to think of the
orderings as a pair of (load? store? -> load? store?) which works for
everything hw actually supports except for ll+ls+ss, rather than mapping
'Release' to (load+store -> store) in my head. e.g. here IIUC we need to
make sure all loads by the CPU are done before we do the store for the
pointer, so we need to make sure loads don't cross ahead of this
barrier but also that stores don't cross behind it, so (load -> store)
should be sufficient? So, depending on what you want to do with the
memory model, this could be tightened IMO. Unlike the one below that
only needs to order stores with eachother (ss).

> + //

nit: stray //

> + dma_mb(Release);
> +
> super::fw::gsp_mem::advance_cpu_read_ptr(&self.0, elem_count)
> }
>
> @@ -422,6 +438,9 @@ fn cpu_write_ptr(&self) -> u32 {
>
> // Informs the GSP that it can process `elem_count` new pages from the command queue.
> fn advance_cpu_write_ptr(&mut self, elem_count: u32) {
> + // ORDERING: Ensure all command data is visible before updateing ring buffer pointer.
> + dma_mb(Write);
> +
> super::fw::gsp_mem::advance_cpu_write_ptr(&self.0, elem_count)
> }
> }
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 0c8a74f0e8ac..62c2cf1b030c 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -42,11 +42,6 @@
>
> // TODO: Replace with `IoView` projections once available.
> pub(super) mod gsp_mem {
> - use core::sync::atomic::{
> - fence,
> - Ordering, //
> - };
> -
> use kernel::{
> dma::Coherent,
> dma_read,
> @@ -72,10 +67,6 @@ pub(in crate::gsp) fn cpu_read_ptr(qs: &Coherent<GspMem>) -> u32 {
>
> pub(in crate::gsp) fn advance_cpu_read_ptr(qs: &Coherent<GspMem>, count: u32) {
> let rptr = cpu_read_ptr(qs).wrapping_add(count) % MSGQ_NUM_PAGES;
> -
> - // Ensure read pointer is properly ordered.
> - fence(Ordering::SeqCst);
> -
> dma_write!(qs, .cpuq.rx.0.readPtr, rptr);
> }
>
> @@ -87,9 +78,6 @@ pub(in crate::gsp) fn advance_cpu_write_ptr(qs: &Coherent<GspMem>, count: u32) {
> let wptr = cpu_write_ptr(qs).wrapping_add(count) % MSGQ_NUM_PAGES;
>
> dma_write!(qs, .cpuq.tx.0.writePtr, wptr);
> -
> - // Ensure all command data is visible before triggering the GSP read.
> - fence(Ordering::SeqCst);
> }
> }
>