Re: [PATCH] mm/alloc_tag: clear codetag for pages allocated before page_ext initialization

From: Hao Ge

Date: Wed Mar 25 2026 - 07:31:30 EST



On 2026/3/25 15:35, Suren Baghdasaryan wrote:
On Tue, Mar 24, 2026 at 11:25 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
On Tue, Mar 24, 2026 at 7:08 PM Hao Ge <hao.ge@xxxxxxxxx> wrote:

On 2026/3/25 08:21, Suren Baghdasaryan wrote:
On Tue, Mar 24, 2026 at 2:43 AM Hao Ge <hao.ge@xxxxxxxxx> wrote:
On 2026/3/24 06:47, Suren Baghdasaryan wrote:
On Mon, Mar 23, 2026 at 2:16 AM Hao Ge <hao.ge@xxxxxxxxx> wrote:
On 2026/3/20 10:14, Suren Baghdasaryan wrote:
On Thu, Mar 19, 2026 at 6:58 PM Hao Ge <hao.ge@xxxxxxxxx> wrote:
On 2026/3/20 07:48, Suren Baghdasaryan wrote:
On Thu, Mar 19, 2026 at 4:44 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
On Thu, Mar 19, 2026 at 3:28 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
On Thu, 19 Mar 2026 16:31:53 +0800 Hao Ge <hao.ge@xxxxxxxxx> wrote:

Due to initialization ordering, page_ext is allocated and initialized
relatively late during boot. Some pages have already been allocated
and freed before page_ext becomes available, leaving their codetag
uninitialized.
Hi Hao,
Thanks for the report.
Hmm. So, we are allocating pages before page_ext is initialized...

A clear example is in init_section_page_ext(): alloc_page_ext() calls
kmemleak_alloc().
Forgot to ask. The example you are using here is for page_ext
allocation itself. Do you have any other examples where page
allocation happens before page_ext initialization? If that's the only
place, then we might be able to fix this in a simpler way by doing
something special for alloc_page_ext().
Hi Suren

To help illustrate the point, here's the debug log I added:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2d4b6f1a554e..ebfe636f5b07 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1293,6 +1293,9 @@ void __pgalloc_tag_add(struct page *page, struct
task_struct *task,
alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr);
update_page_tag_ref(handle, &ref);
put_page_tag_ref(handle);
+ } else {
+ pr_warn("__pgalloc_tag_add: get_page_tag_ref failed!
page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr);
+ dump_stack();
}
}


And I caught the following logs:

[ 0.296399] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea000400c700 pfn=1049372 nr=1
[ 0.296400] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
7.0.0-rc4-dirty #12 PREEMPT(lazy)
[ 0.296402] Hardware name: Red Hat KVM, BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 0.296402] Call Trace:
[ 0.296403] <TASK>
[ 0.296403] dump_stack_lvl+0x53/0x70
[ 0.296405] __pgalloc_tag_add+0x3a3/0x6e0
[ 0.296406] ? __pfx___pgalloc_tag_add+0x10/0x10
[ 0.296407] ? kasan_unpoison+0x27/0x60
[ 0.296409] ? __kasan_unpoison_pages+0x2c/0x40
[ 0.296411] get_page_from_freelist+0xa54/0x1310
[ 0.296413] __alloc_frozen_pages_noprof+0x206/0x4c0
[ 0.296415] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10
[ 0.296417] ? stack_depot_save_flags+0x3f/0x680
[ 0.296418] ? ___slab_alloc+0x518/0x530
[ 0.296420] alloc_pages_mpol+0x13a/0x3f0
[ 0.296421] ? __pfx_alloc_pages_mpol+0x10/0x10
[ 0.296423] ? _raw_spin_lock_irqsave+0x8a/0xf0
[ 0.296424] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 0.296426] alloc_slab_page+0xc2/0x130
[ 0.296427] allocate_slab+0x77/0x2c0
[ 0.296429] ? syscall_enter_define_fields+0x3bb/0x5f0
[ 0.296430] ___slab_alloc+0x125/0x530
[ 0.296432] ? __trace_define_field+0x252/0x3d0
[ 0.296433] __kmalloc_noprof+0x329/0x630
[ 0.296435] ? syscall_enter_define_fields+0x3bb/0x5f0
[ 0.296436] syscall_enter_define_fields+0x3bb/0x5f0
[ 0.296438] ? __pfx_syscall_enter_define_fields+0x10/0x10
[ 0.296440] event_define_fields+0x326/0x540
[ 0.296441] __trace_early_add_events+0xac/0x3c0
[ 0.296443] trace_event_init+0x24c/0x460
[ 0.296445] trace_init+0x9/0x20
[ 0.296446] start_kernel+0x199/0x3c0
[ 0.296448] x86_64_start_reservations+0x18/0x30
[ 0.296449] x86_64_start_kernel+0xe2/0xf0
[ 0.296451] common_startup_64+0x13e/0x141
[ 0.296453] </TASK>


[ 0.312234] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea000400f900 pfn=1049572 nr=1
[ 0.312234] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
7.0.0-rc4-dirty #12 PREEMPT(lazy)
[ 0.312236] Hardware name: Red Hat KVM, BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 0.312236] Call Trace:
[ 0.312237] <TASK>
[ 0.312237] dump_stack_lvl+0x53/0x70
[ 0.312239] __pgalloc_tag_add+0x3a3/0x6e0
[ 0.312240] ? __pfx___pgalloc_tag_add+0x10/0x10
[ 0.312241] ? rmqueue.constprop.0+0x4fc/0x1ce0
[ 0.312243] ? kasan_unpoison+0x27/0x60
[ 0.312244] ? __kasan_unpoison_pages+0x2c/0x40
[ 0.312246] get_page_from_freelist+0xa54/0x1310
[ 0.312248] __alloc_frozen_pages_noprof+0x206/0x4c0
[ 0.312250] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10
[ 0.312253] alloc_slab_page+0x39/0x130
[ 0.312254] allocate_slab+0x77/0x2c0
[ 0.312255] ? alloc_cpumask_var_node+0xc7/0x230
[ 0.312257] ___slab_alloc+0x46d/0x530
[ 0.312259] __kmalloc_node_noprof+0x2fa/0x680
[ 0.312261] ? alloc_cpumask_var_node+0xc7/0x230
[ 0.312263] alloc_cpumask_var_node+0xc7/0x230
[ 0.312264] init_desc+0x141/0x6b0
[ 0.312266] alloc_desc+0x108/0x1b0
[ 0.312267] early_irq_init+0xee/0x1c0
[ 0.312268] ? __pfx_early_irq_init+0x10/0x10
[ 0.312271] start_kernel+0x1ab/0x3c0
[ 0.312272] x86_64_start_reservations+0x18/0x30
[ 0.312274] x86_64_start_kernel+0xe2/0xf0
[ 0.312275] common_startup_64+0x13e/0x141
[ 0.312277] </TASK>

[ 0.312834] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea000400fc00 pfn=1049584 nr=1
[ 0.312835] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
7.0.0-rc4-dirty #12 PREEMPT(lazy)
[ 0.312836] Hardware name: Red Hat KVM, BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 0.312837] Call Trace:
[ 0.312837] <TASK>
[ 0.312838] dump_stack_lvl+0x53/0x70
[ 0.312840] __pgalloc_tag_add+0x3a3/0x6e0
[ 0.312841] ? __pfx___pgalloc_tag_add+0x10/0x10
[ 0.312842] ? rmqueue.constprop.0+0x4fc/0x1ce0
[ 0.312844] ? kasan_unpoison+0x27/0x60
[ 0.312845] ? __kasan_unpoison_pages+0x2c/0x40
[ 0.312847] get_page_from_freelist+0xa54/0x1310
[ 0.312849] __alloc_frozen_pages_noprof+0x206/0x4c0
[ 0.312851] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10
[ 0.312853] alloc_pages_mpol+0x13a/0x3f0
[ 0.312855] ? __pfx_alloc_pages_mpol+0x10/0x10
[ 0.312856] ? xas_find+0x2d8/0x450
[ 0.312858] ? _raw_spin_lock+0x84/0xe0
[ 0.312859] ? __pfx__raw_spin_lock+0x10/0x10
[ 0.312861] alloc_pages_noprof+0xf6/0x2b0
[ 0.312862] __change_page_attr+0x293/0x850
[ 0.312864] ? __pfx___change_page_attr+0x10/0x10
[ 0.312865] ? _vm_unmap_aliases+0x2d0/0x650
[ 0.312868] ? __pfx__vm_unmap_aliases+0x10/0x10
[ 0.312869] __change_page_attr_set_clr+0x16c/0x360
[ 0.312871] ? spp_getpage+0xbb/0x1e0
[ 0.312872] change_page_attr_set_clr+0x220/0x3c0
[ 0.312873] ? flush_tlb_one_kernel+0xf/0x30
[ 0.312875] ? set_pte_vaddr_p4d+0x110/0x180
[ 0.312877] ? __pfx_change_page_attr_set_clr+0x10/0x10
[ 0.312878] ? __pfx_set_pte_vaddr_p4d+0x10/0x10
[ 0.312881] ? __pfx_mtree_load+0x10/0x10
[ 0.312883] ? __pfx_mtree_load+0x10/0x10
[ 0.312884] ? __asan_memcpy+0x3c/0x60
[ 0.312886] ? set_intr_gate+0x10c/0x150
[ 0.312888] set_memory_ro+0x76/0xa0
[ 0.312889] ? __pfx_set_memory_ro+0x10/0x10
[ 0.312891] idt_setup_apic_and_irq_gates+0x2c1/0x390

and more.
Ok, it's not the only place. Got your point.

off topic - if we were to handle only alloc_page_ext() specifically,
what would be the most straightforward

solution in your mind? I'd really appreciate your insight.
I was thinking if it's the only special case maybe we can handle it
somehow differently, like we do when we allocate obj_ext vectors for
slabs using __GFP_NO_OBJ_EXT. I haven't found a good solution yet but
since it's not a special case we would not be able to use it even if I
came up with something...
I think your way is the most straight-forward but please try my
suggestion to see if we can avoid extra overhead.
Thanks,
Suren.
Hi Suren
Hi Suren


Hi Hao,

Hi Suren

Thank you for your feedback. After re-examining this issue,

I realize my previous focus was misplaced.

Upon deeper consideration, I understand that this is not merely a bug,

but rather a warning that indicates a gap in our memory profiling mechanism.

Specifically, the current implementation appears to be missing memory
allocation

tracking during the period between the buddy system allocation and page_ext

initialization.

This profiling gap means we may not be capturing all relevant memory
allocation

events during this critical transition phase.
Correct, this limitation exists because memory profiling relies on
some kernel facilities (page_ext, objj_ext) which might not be
initialized yet at the time of allocation.

My approach is to dynamically allocate codetag_ref when get_page_tag_ref
fails,

and maintain a linked list to track all buddy system allocations that
occur prior to page_ext initialization.

However, this introduces performance concerns:

1. Free Path Overhead: When freeing these pages, we would need to
traverse the entire linked list to locate

the corresponding codetag_ref, resulting in O(n) lookup complexity
per free operation.

2. Initialization Overhead: During init_page_alloc_tagging, iterating
through the linked list to assign codetag_ref to

page_ext would introduce additional traversal cost.

If the number of pages is substantial, this could incur significant
overhead. What are your thoughts on this? I look forward to your
suggestions.
My thinking is that these early allocations comprise a small portion
of overall memory consumed by the system. So, instead of trying to
record and handle them in some alternative way, we just accept that
some counters might not be exactly accurate and ignore those early
allocations. See how the early slab allocations are marked with the
CODETAG_FLAG_INACCURATE flag and later reported as inaccurate. I think
that's an acceptable alternative to introducing extra complexity and
performance overhead. IOW, the benefits of accounting for these early
allocations are low compared to the effort required to account for
them. Unless you found a simple and performant way to do that...
I have been exploring possible solutions to this issue over the past few
days,

but so far I have not come up with a good approach.

I have counted the number of memory allocations that occur earlier than the

allocation and initialization of our page_ext, and found that there are
actually

quite a lot of them.
Interesting... I wonder it's because deferred_struct_pages defers
page_ext initialization. Can you check if setting early_page_ext
reduces or eliminates these allocations before page_ext init cases?
Yes, you are correct. In my 8-core 16GB virtual machine, I used a global
counter

to record these allocations. With early_page_ext enabled, there were 130
allocations

before page_ext initialization. Without early_page_ext, there were 802
allocations

before page_ext initialization.


Similarly, I have made the following changes and collected the
corresponding logs.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2d4b6f1a554e..6db65b3d52d3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1293,6 +1293,8 @@ void __pgalloc_tag_add(struct page *page, struct
task_struct *task,
alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr);
update_page_tag_ref(handle, &ref);
put_page_tag_ref(handle);
+ } else{
+ pr_warn("__pgalloc_tag_add: get_page_tag_ref failed!
page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr);
}
}

@@ -1314,6 +1316,8 @@ void __pgalloc_tag_sub(struct page *page, unsigned
int nr)
alloc_tag_sub(&ref, PAGE_SIZE * nr);
update_page_tag_ref(handle, &ref);
put_page_tag_ref(handle);
+ } else{
+ pr_warn("__pgalloc_tag_sub: get_page_tag_ref failed!
page=%p pfn=%lu nr=%u\n", page, page_to_pfn(page), nr);
}
}

[ 0.261699] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001000 pfn=1048640 nr=2
[ 0.261711] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001100 pfn=1048644 nr=4
[ 0.261717] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001200 pfn=1048648 nr=4
[ 0.261721] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001300 pfn=1048652 nr=4
[ 0.261893] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001080 pfn=1048642 nr=2
[ 0.261917] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001400 pfn=1048656 nr=4
[ 0.262018] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001500 pfn=1048660 nr=2
[ 0.262024] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001600 pfn=1048664 nr=8
[ 0.262040] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001580 pfn=1048662 nr=1
[ 0.262048] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea00040015c0 pfn=1048663 nr=1
[ 0.262056] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001800 pfn=1048672 nr=2
[ 0.262064] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001880 pfn=1048674 nr=2
[ 0.262078] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001900 pfn=1048676 nr=2
[ 0.262196] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
[ 0.262213] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001980 pfn=1048678 nr=2
[ 0.262220] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001a00 pfn=1048680 nr=4
[ 0.262246] ODEBUG: selftest passed
[ 0.262268] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001b00 pfn=1048684 nr=1
[ 0.262318] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001b40 pfn=1048685 nr=1
[ 0.262368] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001b80 pfn=1048686 nr=1
[ 0.262418] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001bc0 pfn=1048687 nr=1
[ 0.262469] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001c00 pfn=1048688 nr=1
[ 0.262519] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001c40 pfn=1048689 nr=1
[ 0.262569] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001c80 pfn=1048690 nr=1
[ 0.262620] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001cc0 pfn=1048691 nr=1
[ 0.262670] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001d00 pfn=1048692 nr=1
[ 0.262721] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001d40 pfn=1048693 nr=1
[ 0.262771] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001d80 pfn=1048694 nr=1
[ 0.262821] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001dc0 pfn=1048695 nr=1
[ 0.262871] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001e00 pfn=1048696 nr=1
[ 0.262923] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001e40 pfn=1048697 nr=1
[ 0.262974] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001e80 pfn=1048698 nr=1
[ 0.263024] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001ec0 pfn=1048699 nr=1
[ 0.263074] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001f00 pfn=1048700 nr=1
[ 0.263124] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001f40 pfn=1048701 nr=1
[ 0.263174] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001f80 pfn=1048702 nr=1
[ 0.263224] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004001fc0 pfn=1048703 nr=1
[ 0.263275] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002000 pfn=1048704 nr=1
[ 0.263325] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002040 pfn=1048705 nr=1
[ 0.263375] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002080 pfn=1048706 nr=1
[ 0.263427] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002400 pfn=1048720 nr=16
[ 0.263437] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea00040020c0 pfn=1048707 nr=1
[ 0.263463] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002100 pfn=1048708 nr=1
[ 0.263465] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002140 pfn=1048709 nr=1
[ 0.263467] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002180 pfn=1048710 nr=1
[ 0.263509] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002200 pfn=1048712 nr=4
[ 0.263512] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002800 pfn=1048736 nr=8
[ 0.263524] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea00040021c0 pfn=1048711 nr=1
[ 0.263536] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002300 pfn=1048716 nr=1
[ 0.263537] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002340 pfn=1048717 nr=1
[ 0.263539] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002380 pfn=1048718 nr=1
[ 0.263604] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004004000 pfn=1048832 nr=128
[ 0.263638] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004003000 pfn=1048768 nr=64
[ 0.263650] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002c00 pfn=1048752 nr=16
[ 0.263655] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea00040023c0 pfn=1048719 nr=1
[ 0.270582] __pgalloc_tag_sub: get_page_tag_ref failed!
page=ffffea00040023c0 pfn=1048719 nr=1
[ 0.270591] ftrace: allocating 52717 entries in 208 pages
[ 0.270592] ftrace: allocated 208 pages with 3 groups
[ 0.270620] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004002a00 pfn=1048744 nr=8
[ 0.270636] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea00040023c0 pfn=1048719 nr=1
[ 0.270643] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006000 pfn=1048960 nr=1
[ 0.270649] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006040 pfn=1048961 nr=1
[ 0.270658] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004007000 pfn=1049024 nr=64
[ 0.270659] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006080 pfn=1048962 nr=2
[ 0.270722] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006100 pfn=1048964 nr=1
[ 0.270730] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006140 pfn=1048965 nr=1
[ 0.270738] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006180 pfn=1048966 nr=1
[ 0.270777] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea00040061c0 pfn=1048967 nr=1
[ 0.270786] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006200 pfn=1048968 nr=1
[ 0.270792] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006240 pfn=1048969 nr=1
[ 0.270833] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006300 pfn=1048972 nr=4
[ 0.270891] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006280 pfn=1048970 nr=1
[ 0.270980] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea00040062c0 pfn=1048971 nr=1
[ 0.271071] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006400 pfn=1048976 nr=1
[ 0.271156] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006440 pfn=1048977 nr=1
[ 0.271185] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006480 pfn=1048978 nr=2
[ 0.271301] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006500 pfn=1048980 nr=1
[ 0.271655] Dynamic Preempt: lazy
[ 0.271662] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006580 pfn=1048982 nr=2
[ 0.271752] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006600 pfn=1048984 nr=4
[ 0.271762] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004010000 pfn=1049600 nr=4
[ 0.271824] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006540 pfn=1048981 nr=1
[ 0.271916] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006700 pfn=1048988 nr=2
[ 0.271964] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006780 pfn=1048990 nr=1
[ 0.272099] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea00040067c0 pfn=1048991 nr=1
[ 0.272138] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006800 pfn=1048992 nr=2
[ 0.272144] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006a00 pfn=1049000 nr=8
[ 0.272249] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006c00 pfn=1049008 nr=8
[ 0.272319] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006880 pfn=1048994 nr=2
[ 0.272351] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006900 pfn=1048996 nr=4
[ 0.272424] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004006e00 pfn=1049016 nr=8
[ 0.272485] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008000 pfn=1049088 nr=8
[ 0.272535] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008200 pfn=1049096 nr=2
[ 0.272600] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008400 pfn=1049104 nr=8
[ 0.272663] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008300 pfn=1049100 nr=4
[ 0.272694] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008280 pfn=1049098 nr=2
[ 0.272708] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008600 pfn=1049112 nr=8

[ 0.272924] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008880 pfn=1049122 nr=2
[ 0.272934] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008900 pfn=1049124 nr=2
[ 0.272952] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008c00 pfn=1049136 nr=4
[ 0.273035] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008980 pfn=1049126 nr=2
[ 0.273062] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008e00 pfn=1049144 nr=8
[ 0.273674] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008d00 pfn=1049140 nr=1
[ 0.273884] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008d80 pfn=1049142 nr=2
[ 0.273943] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009000 pfn=1049152 nr=2
[ 0.274379] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009080 pfn=1049154 nr=2
[ 0.274575] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009200 pfn=1049160 nr=8
[ 0.274617] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009100 pfn=1049156 nr=4
[ 0.274794] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009400 pfn=1049168 nr=2
[ 0.274840] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009480 pfn=1049170 nr=2
[ 0.275057] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009500 pfn=1049172 nr=2
[ 0.275092] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009580 pfn=1049174 nr=2
[ 0.275134] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009600 pfn=1049176 nr=8
[ 0.275211] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009800 pfn=1049184 nr=4
[ 0.275510] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009900 pfn=1049188 nr=2
[ 0.275548] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009980 pfn=1049190 nr=2
[ 0.275976] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009a00 pfn=1049192 nr=8
[ 0.275987] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009c00 pfn=1049200 nr=2
[ 0.276139] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009c80 pfn=1049202 nr=2
[ 0.276152] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004008d40 pfn=1049141 nr=1
[ 0.276242] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009d00 pfn=1049204 nr=1
[ 0.276358] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009d40 pfn=1049205 nr=1
[ 0.276444] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009d80 pfn=1049206 nr=1
[ 0.276526] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009dc0 pfn=1049207 nr=1
[ 0.276615] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009e00 pfn=1049208 nr=1
[ 0.276696] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009e40 pfn=1049209 nr=1
[ 0.276792] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009e80 pfn=1049210 nr=1
[ 0.276827] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009f00 pfn=1049212 nr=2
[ 0.276891] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009ec0 pfn=1049211 nr=1
[ 0.276999] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009f80 pfn=1049214 nr=1
[ 0.277082] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea0004009fc0 pfn=1049215 nr=1
[ 0.277172] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea000400a000 pfn=1049216 nr=1
[ 0.277257] __pgalloc_tag_add: get_page_tag_ref failed!
page=ffffea000400a040 pfn=1049217 nr=1

and so on.


I think your earlier patch can effectively detect these early
allocations and suppress the warnings. We should also mark these
allocations with CODETAG_FLAG_INACCURATE.
Thanks to an excellent AI review, I realized there are issues with

my original patch. One problem is the 256-element array; another
Yes, if there are lots of such allocations, it's not appropriate.

is that it involves allocation and free operations — meaning we need

to record entries at __pgalloc_tag_add and remove them at __pgalloc_tag_sub,

which introduces a noticeable overhead. I'm wondering if we can instead
set a flag

bit in page flags during the early boot stage, which I'll refer to as
EARLY_ALLOC_FLAGS.

Then, in __pgalloc_tag_sub, we first check for EARLY_ALLOC_FLAGS. If
set, we clear the

flag and return immediately; otherwise, we perform the actual
subtraction of the tag count.

This approach seems somewhat similar to the idea behind
mem_profiling_compressed.
That seems doable but let's first check if we can make page_ext
initialization happen before these allocations. That would be the
ideal path. If it's not possible then we can focus on alternatives
like the one you propose.

Yes, the ideal scenario would be to have page_ext initialization
complete before

these allocations occur. I just did a code walkthrough and found that
this resembles

the FLATMEM implementation approach - FLATMEM allocates page_ext before
the buddy

system initialization, so it doesn't seem to encounter the issue we're
facing now.

https://elixir.bootlin.com/linux/v7.0-rc5/source/mm/mm_init.c#L2707
Yes, page_ext_init_flatmem() looks like an interesting option and it
would not work with sparsemem. TBH I would prefer to find a simple
solution that can identify early init allocations, mark them inaccuate
and suppress the warning rather than introduce some complex mechanism
to account for them which would work only is some cases (flatmem).
With your original approach I think the only real issue is the size of
the array that might be too small. The other issue you mentioned about
allocated page being freed and then re-allocated after page_ext is
inialized but before clear_page_tag_ref() is called is not really a
problem. Yes, we will lose that counter's value but it's similar to
other early allocations which we just treat as inaccurate. We can also
minimize the possibility of this happening by moving
clear_page_tag_ref() into init_page_alloc_tagging().

I don't like the pageflag option you mentioned because it adds an
extra condition check into __pgalloc_tag_sub() which will be executed
even after the init stage is over.
I'll look into this some more tomorrow as it's quite late now.


Hi Suren


Just though of something. Are all these pages allocated by slab? If
so, I think slab does not use page->lru (need to double-check) and we
could add all these pages allocated during early init into a list and
then set their page_ext reference to CODETAG_EMPTY in
init_page_alloc_tagging().

Got your point.


There will indeed be some non-SLAB memory allocations here, such as the following:


CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy)
[    0.326607] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    0.326608] Call Trace:
[    0.326608]  <TASK>
[    0.326609]  dump_stack_lvl+0x53/0x70
[    0.326611]  __pgalloc_tag_add+0x407/0x700
[    0.326616]  get_page_from_freelist+0xa54/0x1310
[    0.326618]  __alloc_frozen_pages_noprof+0x206/0x4c0
[    0.326623]  alloc_pages_mpol+0x13a/0x3f0
[    0.326627]  alloc_pages_noprof+0xf6/0x2b0
[    0.326628]  __pmd_alloc+0x743/0x9c0
[    0.326630]  vmap_range_noflush+0xac0/0x10a0
[    0.326637]  ioremap_page_range+0x17c/0x250
[    0.326639]  __ioremap_caller+0x437/0x5c0
[    0.326645]  acpi_os_map_iomem+0x4c0/0x660
[    0.326647]  acpi_tb_verify_temp_table+0x1c0/0x580
[    0.326649]  acpi_reallocate_root_table+0x2ad/0x460
[    0.326655]  acpi_early_init+0x111/0x460
[    0.326657]  start_kernel+0x271/0x3c0
[    0.326659]  x86_64_start_reservations+0x18/0x30
[    0.326660]  x86_64_start_kernel+0xe2/0xf0
[    0.326662]  common_startup_64+0x13e/0x141
[    0.326663]  </TASK>

CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy)
[    0.329167] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    0.329167] Call Trace:
[    0.329167]  <TASK>
[    0.329167]  dump_stack_lvl+0x53/0x70
[    0.329167]  __pgalloc_tag_add+0x407/0x700
[    0.329167]  get_page_from_freelist+0xa54/0x1310
[    0.329167]  __alloc_frozen_pages_noprof+0x206/0x4c0
[    0.329167]  __alloc_pages_noprof+0x10/0x1b0
[    0.329167]  dup_task_struct+0x163/0x8c0
[    0.329167]  copy_process+0x390/0x4a70
[    0.329167]  kernel_clone+0xe1/0x830
[    0.329167]  kernel_thread+0xcb/0x110
[    0.329167]  kthreadd+0x8a2/0xc60
[    0.329167]  ret_from_fork+0x551/0x720
[    0.329167]  ret_from_fork_asm+0x1a/0x30
[    0.329167]  </TASK>

CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy)
[    0.329167] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    0.329167] Call Trace:
[    0.329167]  <TASK>
[    0.329167]  dump_stack_lvl+0x53/0x70
[    0.329167]  __pgalloc_tag_add+0x407/0x700
[    0.329167]  get_page_from_freelist+0xa54/0x1310
[    0.329167]  __alloc_frozen_pages_noprof+0x206/0x4c0
[    0.329167]  __alloc_pages_noprof+0x10/0x1b0
[    0.329167]  dup_task_struct+0x163/0x8c0
[    0.329167]  copy_process+0x390/0x4a70
[    0.329167]  kernel_clone+0xe1/0x830
[    0.329167]  kernel_thread+0xcb/0x110
[    0.329167]  kthreadd+0x8a2/0xc60
[    0.329167]  ret_from_fork+0x551/0x720
[    0.329167]  ret_from_fork_asm+0x1a/0x30
[    0.329167]  </TASK>

CPU: 4 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy)
[    0.434265] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    0.434266] Call Trace:
[    0.434266]  <TASK>
[    0.434266]  dump_stack_lvl+0x53/0x70
[    0.434268]  __pgalloc_tag_add+0x407/0x700
[    0.434272]  get_page_from_freelist+0xa54/0x1310
[    0.434274]  __alloc_frozen_pages_noprof+0x206/0x4c0
[    0.434279]  alloc_pages_exact_nid_noprof+0x10f/0x380
[    0.434283]  init_section_page_ext+0x167/0x370
[    0.434284]  page_ext_init+0x451/0x620
[    0.434287]  page_alloc_init_late+0x553/0x630
[    0.434290]  kernel_init_freeable+0x7be/0xd30
[    0.434294]  kernel_init+0x1f/0x1f0
[    0.434295]  ret_from_fork+0x551/0x720
[    0.434301]  ret_from_fork_asm+0x1a/0x30
[    0.434303]  </TASK>

CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-00001-g6392c3a6119e-dirty #31 PREEMPT(lazy)
[    0.346712] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    0.346713] Call Trace:
[    0.346713]  <TASK>
[    0.346714]  dump_stack_lvl+0x53/0x70
[    0.346715]  __pgalloc_tag_add+0x407/0x700
[    0.346720]  get_page_from_freelist+0xa54/0x1310
[    0.346723]  __alloc_frozen_pages_noprof+0x206/0x4c0
[    0.346729]  __alloc_pages_noprof+0x10/0x1b0
[    0.346731]  alloc_cpu_data+0x96/0x210
[    0.346732]  rb_allocate_cpu_buffer+0xb93/0x1500
[    0.346739]  trace_rb_cpu_prepare+0x21a/0x4f0
[    0.346753]  cpuhp_invoke_callback+0x6db/0x14b0
[    0.346755]  __cpuhp_invoke_callback_range+0xde/0x1d0
[    0.346759]  _cpu_up+0x395/0x880
[    0.346761]  cpu_up+0x1bb/0x210
[    0.346762]  cpuhp_bringup_mask+0xd2/0x150
[    0.346763]  bringup_nonboot_cpus+0x12b/0x170
[    0.346764]  smp_init+0x2f/0x100
[    0.346766]  kernel_init_freeable+0x7a5/0xd30
[    0.346769]  kernel_init+0x1f/0x1f0
[    0.346771]  ret_from_fork+0x551/0x720
[    0.346776]  ret_from_fork_asm+0x1a/0x30
[    0.346778]  </TASK>

and so on...


In fact, I previously conducted extensive and prolonged stress testing

on memory profiling. After our efforts to address several WARN cases,

one remaining scenario we are addressing is the warning triggered during

early slab cache reclaim — which is precisely the situation we are currently

encountering (although I cannot guarantee that all edge cases have been

covered by our stress testing). During the stress testing process, this warning

did indeed manifest. However, the current environment triggers KASAN slab

cache reclaim earlier than anticipated.


Although the memory allocated prior to page_ext initialization has a relatively low probability of

being released in subsequent operations (at least we have not encountered such cases up to now),

 I remain uncertain whether there are any overlooked edge cases when considering only slab-backed pages.


Thanks
Hao

Thanks,
Suren.

However, I'm not entirely certain whether SPARSEMEM can guarantee the
same behavior.


I would appreciate your valuable feedback and any better suggestions you
might have.
Thanks for pursuing this! I'll help in any way I can.
Suren.
Thank you so much for your patient guidance and assistance.

I truly appreciate your willingness to share your knowledge and insights.

Thanks,
Hao

Thanks

Hao

Thanks,
Suren.

Thanks

Hao

Thanks.


If the slab cache has no free objects, it falls back
to the buddy allocator to allocate memory. However, at this point page_ext
is not yet fully initialized, so these newly allocated pages have no
codetag set. These pages may later be reclaimed by KASAN,which causes
the warning to trigger when they are freed because their codetag ref is
still empty.

Use a global array to track pages allocated before page_ext is fully
initialized, similar to how kmemleak tracks early allocations.
When page_ext initialization completes, set their codetag
to empty to avoid warnings when they are freed later.

...

--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -74,6 +74,9 @@ static inline void set_codetag_empty(union codetag_ref *ref)

#ifdef CONFIG_MEM_ALLOC_PROFILING

+bool mem_profiling_is_available(void);
+void alloc_tag_add_early_pfn(unsigned long pfn);
+
#define ALLOC_TAG_SECTION_NAME "alloc_tags"

struct codetag_bytes {
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 58991ab09d84..a5bf4e72c154 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -6,6 +6,7 @@
#include <linux/kallsyms.h>
#include <linux/module.h>
#include <linux/page_ext.h>
+#include <linux/pgalloc_tag.h>
#include <linux/proc_fs.h>
#include <linux/seq_buf.h>
#include <linux/seq_file.h>
@@ -26,6 +27,82 @@ static bool mem_profiling_support;

static struct codetag_type *alloc_tag_cttype;

+/*
+ * State of the alloc_tag
+ *
+ * This is used to describe the states of the alloc_tag during bootup.
+ *
+ * When we need to allocate page_ext to store codetag, we face an
+ * initialization timing problem:
+ *
+ * Due to initialization order, pages may be allocated via buddy system
+ * before page_ext is fully allocated and initialized. Although these
+ * pages call the allocation hooks, the codetag will not be set because
+ * page_ext is not yet available.
+ *
+ * When these pages are later free to the buddy system, it triggers
+ * warnings because their codetag is actually empty if
+ * CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled.
+ *
+ * Additionally, in this situation, we cannot record detailed allocation
+ * information for these pages.
+ */
+enum mem_profiling_state {
+ DOWN, /* No mem_profiling functionality yet */
+ UP /* Everything is working */
+};
+
+static enum mem_profiling_state mem_profiling_state = DOWN;
+
+bool mem_profiling_is_available(void)
+{
+ return mem_profiling_state == UP;
+}
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+
+#define EARLY_ALLOC_PFN_MAX 256
+
+static unsigned long early_pfns[EARLY_ALLOC_PFN_MAX];
It's unfortunate that this isn't __initdata.

+static unsigned int early_pfn_count;
+static DEFINE_SPINLOCK(early_pfn_lock);
+

...

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1293,6 +1293,13 @@ void __pgalloc_tag_add(struct page *page, struct task_struct *task,
alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr);
update_page_tag_ref(handle, &ref);
put_page_tag_ref(handle);
+ } else {
This branch can be marked as "unlikely".

+ /*
+ * page_ext is not available yet, record the pfn so we can
+ * clear the tag ref later when page_ext is initialized.
+ */
+ if (!mem_profiling_is_available())
+ alloc_tag_add_early_pfn(page_to_pfn(page));
}
}
All because of this, I believe. Is this fixable?

If we take that `else', we know we're running in __init code, yes? I
don't see how `__init pgalloc_tag_add_early()' could be made to work.
hrm. Something clever, please.
We can have a pointer to a function that is initialized to point to
alloc_tag_add_early_pfn, which is defined as __init and uses
early_pfns which now can be defined as __initdata. After
clear_early_alloc_pfn_tag_refs() is done we reset that pointer to
NULL. __pgalloc_tag_add() instead of calling alloc_tag_add_early_pfn()
directly checks that pointer and if it's not NULL then calls the
function that it points to. This way __pgalloc_tag_add() which is not
an __init function will be invoking alloc_tag_add_early_pfn() __init
function only until we are done with initialization. I haven't tried
this but I think that should work. This also eliminates the need for
mem_profiling_state variable since we can use this function pointer
instead.