Re: [PATCH v9 06/31] gpu: nova-core: Hopper/Blackwell: skip GFW boot waiting

From: Gary Guo

Date: Mon Mar 30 2026 - 11:34:18 EST

On Mon Mar 30, 2026 at 3:52 PM BST, Alexandre Courbot wrote:
> On Thu Mar 26, 2026 at 10:38 AM JST, John Hubbard wrote:
>> Hopper and Blackwell GPUs use FSP-based secure boot and do not
>> require waiting for GFW_BOOT completion. Move the GFW_BOOT wait
>> into a GPU HAL so the decision and the wait both live in the HAL.
>>
>> Pre-Hopper families (Tu102 HAL) wait for GFW_BOOT completion.
>> Hopper and later (Gh100 HAL) skip it and boot via FSP instead.
>>
>> Signed-off-by: John Hubbard <jhubbard@xxxxxxxxxx>
>> ---
>> drivers/gpu/nova-core/gpu.rs | 6 +++--
>> drivers/gpu/nova-core/gpu/hal.rs | 41 ++++++++++++++++++++++++++++++++
>> 2 files changed, 45 insertions(+), 2 deletions(-)
>> create mode 100644 drivers/gpu/nova-core/gpu/hal.rs
>>
>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>> index f70bfbda1614..8332ad67c0af 100644
>> --- a/drivers/gpu/nova-core/gpu.rs
>> +++ b/drivers/gpu/nova-core/gpu.rs
>> @@ -21,11 +21,12 @@
>> Falcon, //
>> },
>> fb::SysmemFlush,
>> - gfw,
>> gsp::Gsp,
>> regs,
>> };
>>
>> +mod hal;
>> +
>> macro_rules! define_chipset {
>> ({ $($variant:ident = $value:expr),* $(,)* }) =>
>> {
>> @@ -311,6 +312,7 @@ pub(crate) fn new<'a>(
>> spec: Spec,
>> ) -> impl PinInit<Self, Error> + 'a {
>> let dma_mask = spec.chipset().arch().dma_mask();
>> + let hal = hal::gpu_hal(spec.chipset());
>>
>> try_pin_init!(Self {
>> // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
>> @@ -319,7 +321,7 @@ pub(crate) fn new<'a>(
>> // still constructing it, so no concurrent DMA allocations can exist.
>> unsafe { pdev.dma_set_mask_and_coherent(dma_mask)? };
>>
>> - gfw::wait_gfw_boot_completion(bar)
>> + hal.wait_gfw_boot_completion(bar)
>> .inspect_err(|_| dev_err!(pdev, "GFW boot did not complete\n"))?;
>> },
>>
>> diff --git a/drivers/gpu/nova-core/gpu/hal.rs b/drivers/gpu/nova-core/gpu/hal.rs
>> new file mode 100644
>> index 000000000000..164410992659
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/gpu/hal.rs
>> @@ -0,0 +1,41 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +use kernel::prelude::*;
>> +
>> +use crate::{
>> + driver::Bar0,
>> + gfw,
>> + gpu::{
>> + Architecture,
>> + Chipset, //
>> + },
>> +};
>> +
>> +pub(crate) trait GpuHal {
>> + /// Waits for GFW_BOOT completion if required by this hardware family.
>> + fn wait_gfw_boot_completion(&self, bar: &Bar0) -> Result;
>> +}
>> +
>> +struct Tu102;
>> +struct Gh100;
>> +
>> +impl GpuHal for Tu102 {
>> + fn wait_gfw_boot_completion(&self, bar: &Bar0) -> Result {
>> + gfw::wait_gfw_boot_completion(bar)
>> + }
>> +}
>> +
>> +impl GpuHal for Gh100 {
>> + fn wait_gfw_boot_completion(&self, _bar: &Bar0) -> Result {
>> + Ok(())
>> + }
>> +}
>
> Please take a look at how other HALs are implemented: each HAL instance
> is in its own module. That's not just a cosmetic choice; it allows us to
> keep the chipset's specific HAL struct and its helpers completely
> private and forces us to make code-sharing explicit. Furthermore, this
> particular HAL is bound to grow, so let's split it properly from the
> start.
>
> If you do that it also makes more sense to use constants (contrary to
> Gary's feedback on v8), if only to align with the rest of the driver.

The exsiting cases in fb/hal declare the constants without even exposing it (the
only use case is to pass it to XXX_HAL.

These looks redundant and should be changed to

pub(super) const GA100_HAL: &dyn FbHal = &Ga100;

etc. instead.

Best,
Gary

>
> Once this is done, making `gpu::hal::tu102` absorb the `gfw` module is
> trivial, so let's do that while we are at it - having `gfw` as being
> driver-wide makes little sense since it has a very limited role for a
> specific subset of the chips we support.