Re: [PATCH v3 1/3] driver/base: Optimize memory block registration to reduce boot time

From: Donet Tom
Date: Mon May 05 2025 - 08:51:52 EST



On 5/5/25 4:06 PM, David Hildenbrand wrote:
On 05.05.25 11:36, Oscar Salvador wrote:
On Mon, May 05, 2025 at 10:12:48AM +0200, David Hildenbrand wrote:
Assume you hotplug the second CPU. The node is already registered/online, so
who does the register_cpu_under_node() call?

It's register_cpu() I guess? But no idea in which order that is called with
node onlining.

The code has to be cleaned up such that onlining a node does not traverse
any cpus / memory.

Whoever adds a CPU / memory *after onlining the node* must register the
device manually under the *now online* node.

So, I think this is the sequence of events:

- hotplug cpu:
   acpi_processor_hotadd_init
    register_cpu
     register_cpu_under_node

   online_store
    device_online()->dev_bus_online()
     cpu_subsys->online()
      cpu_subsys_online
       cpu_device_up
        cpu_up
         try_online_node  <- brings node online
          ...
          register_one_node <- registers cpu under node
         _cpu_up

My thinking was, whether we can simply move the register_cpu_under_node() after the try_online_node(). See below regarding early.

And then, remove the !node_online check from register_cpu_under_node().

But it's all complicated, because for memory, we link a memory block to the node (+set the node online) when it gets added, not when it gets onlined.

For CPUs, we seem to be creating the link + set the node online when the CPU gets onlined.


The first time we hotplug a cpu to the node, note that
register_cpu()->register_cpu_under_node() will bail out as node is still
offline, so only cpu's sysfs will be created but they will not be linked
to the node.
> Later, online_store()->...->cpu_subsys_online()->..->cpu_up() will take> care of 1) onlining the node and 2) register the cpu to the node (so,
link the sysfs).


And only if it actually gets onlined I assume.


The second time we hotplug a cpu,
register_cpu()->register_cpu_under_node() will do its job as the node is
already onlined.
And we will not be calling register_one_node() from __try_online_node()
because of the same reason.

The thing that bothers me is having register_cpu_under_node() spread
around.

Right.

I think that ideally, we should only be calling register_cpu_under_node()
from register_cpu(), but we have this kinda of (sort of weird?) relation
that even if we hotplug the cpu, but we do not online it, the numa node
will remain online, and so we cannot do the linking part (cpu <-> node),
so we could not really only have register_cpu_under_node() in
register_cpu(), which is the hot-add part, but we also need it in the
cpu_up()->try_online_node() which is the online part.

Maybe one could handle CPUs similar to how we handle it with memory: node gets onlined + link created as soon as we add the CPU, not when we online it.

But likely there is a reason why we do it like that today ...


And we cannot also remove the register_cpu_under_node() from
register_cpu() because it is used in other paths (e.g: at boot time ).

Ah, so in that case we don't call cpu_up ... hm.

Of course, we can always detect the context (early vs. hotplug). Maybe, we should split the early vs. hotplug case up much earlier.

register_cpu_early() / register_cpu_hotplug() ... maybe

Hi David and Oscar,

I was thinking that __try_online_node(nid, true) being called from
try_online_node() might cause issues with this patch. From the
discussion above, what I understand is:

When try_online_node() is called, there are no memory resources
available for the node, so register_memory_blocks_under_node()
has no effect. Therefore, our patch should work in all cases.

Do you think we need to make any changes to this patch?

Thanks
Donet