Re: [PATCH] sched/isolation: Don't free memblock allocated cpumasks

From: Frederic Weisbecker

Date: Tue May 12 2026 - 09:51:38 EST


Le Mon, May 11, 2026 at 05:36:08PM -0400, Waiman Long a écrit :
> On 5/11/26 4:34 AM, Mike Rapoport wrote:
> > On Mon, May 11, 2026 at 12:55:39AM -0400, Waiman Long wrote:
> > > On 5/10/26 11:02 AM, Mike Rapoport wrote:
> > > > Hi Waiman,
> > > >
> > > > On Tue, May 05, 2026 at 01:18:21AM -0400, Waiman Long wrote:
> > > > > When testing a v7.1 kernel with commit 59bd1d914bb5 ("memblock: warn when
> > > > > freeing reserved memory before memory map is initialized"), the following
> > > > > warning was hit when there was a "nohz_full" kernel boot parameter.
> > > > >
> > > > > [ 0.080911] Cannot free reserved memory because of deferred initialization of the memory map
> > > > > [ 0.080911] WARNING: mm/memblock.c:904 at __free_reserved_area+0xde/0xf0, CPU#0: swapper/0/0
> > > > > :
> > > > > [ 0.080945] Call Trace:
> > > > > [ 0.080947] <TASK>
> > > > > [ 0.080949] memblock_phys_free+0xcb/0x100
> > > > > [ 0.080953] housekeeping_init+0x14c/0x170
> > > > > [ 0.080957] start_kernel+0x207/0x450
> > > > > [ 0.080961] x86_64_start_reservations+0x24/0x30
> > > > > [ 0.080965] x86_64_start_kernel+0xda/0xe0
> > > > > [ 0.080967] common_startup_64+0x13e/0x141
> > > > > [ 0.080972] </TASK>
> > > > >
> > > > > The commit states that freeing of reserved memory before the memory
> > > > > map is fully initialized in deferred_init_memmap() would cause access
> > > > > to uninitialized struct pages and may crash when accessing spurious
> > > > > list pointers. However, if the memblock_free() call is deferred to
> > > > > the start of initcall processing in the bootup process, for instance,
> > > > > the following KASAN warning can appear.
> > > > >
> > > > > [ 8.514775] BUG: KASAN: use-after-free in memblock_isolate_range+0x4ac/0x650
> > > > > [ 8.514775] Read of size 8 at addr ffff88a07fe6a000 by task swapper/0/1
> > > > > :
> > > > > [ 8.514775] Call Trace:
> > > > > [ 8.514775] <TASK>
> > > > > [ 8.514775] kasan_report+0xb2/0x1b0
> > > > > [ 8.514775] memblock_isolate_range+0x4ac/0x650
> > > > > [ 8.514775] memblock_phys_free+0xc4/0x190
> > > > > [ 8.514775] housekeeping_late_init+0x257/0x280
> > > > > [ 8.514775] do_one_initcall+0xaa/0x470
> > > > > [ 8.514775] do_initcalls+0x1b4/0x1f0
> > > > > [ 8.514775] kernel_init_freeable+0x4b5/0x550
> > > > > [ 8.514775] kernel_init+0x1c/0x150
> > > > > [ 8.514775] ret_from_fork+0x5dc/0x8e0
> > > > > [ 8.514775] ret_from_fork_asm+0x1a/0x30
> > > > > [ 8.514775] </TASK>
> > > > >
> > > > > It is likely that memblock_discard() may discard memblock data needed
> > > > > for memblock_free(). One workaround for now to avoid these warning/bug
> > > > > messages is to keep the memblock allocated cpumasks even if they are
> > > > > no longer needed until the memblock subsystem is properly updated to
> > > > > handle memblock_free().
> > > > >
> > > > > On most systems, memory occuipied by a cpumask is pretty small. So not
> > > > > much memory will be wasted if the memblock cpumasks are not freed.
> > > > >
> > > > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> > > > > ---
> > > > > kernel/sched/isolation.c | 8 +++++++-
> > > > > 1 file changed, 7 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > > > > index ef152d401fe2..ad9b1a1104e3 100644
> > > > > --- a/kernel/sched/isolation.c
> > > > > +++ b/kernel/sched/isolation.c
> > > > > @@ -189,7 +189,13 @@ void __init housekeeping_init(void)
> > > > > WARN_ON_ONCE(cpumask_empty(omask));
> > > > > cpumask_copy(nmask, omask);
> > > > > RCU_INIT_POINTER(housekeeping.cpumasks[type], nmask);
> > > > > - memblock_free(omask, cpumask_size());
> > > > > +
> > > > > + /*
> > > > > + * TODO: Don't free memblock allocated cpumasks until the
> > > > > + * memblock subystem is able to handle the memblock_free()
> > > > > + * properly.
> > > > > + */
> > > > > + // memblock_free(omask, cpumask_size());
> > > > Before 59bd1d914bb5 it was a silent leak. housekeeping_init() is called
> > > > after memblock moves all the memory to buddy, so this would only update
> > > > memblock.reserved.
> > > >
> > > > The comment a few lines above says that we reallocate to be able to kfree()
> > > > later. Is it possible to delay reallocation until an initcall?
> > > My original thought was to defer the freeing to init call. That changes led
> > > to the KASAN bug splat listed in the commit log, I think the right window to
> > > free memblock memory is currently just too narrow. Do you mean that with the
> > > fix patch you sent to Breno, memblock freeing in initcall will work without
> > > bug report?
> > Yes, with the fix I sent to Breno memblock_free() should work in an
> > initcall and "do the right thing".
>
> Thanks for the confirmation. I have tested your patch with my patch to defer
> the memblock_free() to initcall. There is no longer any KASAN splat when
> booting up a debug test kernel. You can add the following tag when you send
> out your patch.
>
> Tested-by: Waiman Long <longman@xxxxxxxxxx>

Thanks a lot guys!

--
Frederic Weisbecker
SUSE Labs