On Tue, Jun 10, 2025 at 10:27 PM Hao Ge <hao.ge@xxxxxxxxx> wrote:Hi Suren
Hi Hao,
On 2025/6/10 00:39, Suren Baghdasaryan wrote:
On Sun, Jun 8, 2025 at 11:08 PM Hao Ge <hao.ge@xxxxxxxxx> wrote:Hi Suren
On 2025/5/29 15:35, Hao Ge wrote:Hi Hao,
From: Hao Ge <gehao@xxxxxxxxxx>Hi Suren
Recently discovered this entry while checking kallsyms on ARM64:
ffff800083e509c0 D _shared_alloc_tag
If ARCH_NEEDS_WEAK_PER_CPU is not defined,there's no need to statically
define the percpu variable alloc_tag_counters.
Therefore,add therelevant macro guards at the appropriate location.
Fixes: 22d407b164ff ("lib: add allocation tagging support for memory allocation profiling")
Signed-off-by: Hao Ge <gehao@xxxxxxxxxx>
---
lib/alloc_tag.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index c7f602fa7b23..d1dab80b70ad 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -24,8 +24,10 @@ static bool mem_profiling_support;
static struct codetag_type *alloc_tag_cttype;
+#ifdef ARCH_NEEDS_WEAK_PER_CPU
DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
EXPORT_SYMBOL(_shared_alloc_tag);
+#endif /* ARCH_NEEDS_WEAK_PER_CPU */
DEFINE_STATIC_KEY_MAYBE(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT,
mem_alloc_profiling_key);
I'm sorry to bother you. As mentioned in my commit message,
in fact, on the ARM64 architecture, the _shared_alloc_tag percpu
variable is not needed.
In my understanding, it will create a copy for each CPU.
The alloc_tag_counters variable will occupy 16 bytes,
and as the number of CPUs increases, more and more memory will be wasted
in this segment.
I realized that this modification was a mistake. It resulted in a build
error, and the link is as follows:
https://lore.kernel.org/all/202506080448.KWN8arrX-lkp@xxxxxxxxx/
After I studied the comments of DECLARE_PER_CPU_SECTION, I roughly
understood why this is the case.
But so far, I haven't come up with a good way to solve this problem. Do
you have any suggestions?
The problem here is that ARCH_NEEDS_WEAK_PER_CPU is not a Kconfig
option, it gets defined only on 2 architectures and only when building
modules here https://elixir.bootlin.com/linux/v6.15.1/source/arch/alpha/include/asm/percpu.h#L14
and here https://elixir.bootlin.com/linux/v6.15.1/source/arch/s390/include/asm/percpu.h#L21.
A nicer way to deal with that is to make if a Kconfig option which is
enabled only for alpha and s390 and then do something like this:
#if defined(MODULE) && defined(ARCH_NEEDS_WEAK_PER_CPU)
#define MODULE_NEEDS_WEAK_PER_CPU
#endif
and change all the usages of ARCH_NEEDS_WEAK_PER_CPU with
MODULE_NEEDS_WEAK_PER_CPU.
Did I explain the idea clearly?
Thanks,
Suren.
Thanks for your guidance.
I understand this train of thought.
I've been thinking about a problem: I only added the
ARCH_NEEDS_WEAK_PER_CPU
macro to isolate the definition of _shared_alloc_tag.
Since s390 defines this macro, why did this build error occur?
The problem is that ARCH_NEEDS_WEAK_PER_CPU is not a Kconfig option,
it's just a definition, for s390 it's here:
https://elixir.bootlin.com/linux/v6.15.1/source/arch/s390/include/asm/percpu.h#L21
So, even for s390 if you are building core kernel code (not a module),
ARCH_NEEDS_WEAK_PER_CPU will be undefined, however if you are building
a module on s390 then it is defined. So, your change effectively
results in _shared_alloc_tag being compiled out in the core kernel
while it's used when you build a module. Therefore during linking
modules can't link to that symbol in the core kernel. Hope this
explains the issue.
The way I would fix this is by making ARCH_NEEDS_WEAK_PER_CPU a
Kconfig option and enable it for s390 and alpha, would replace old
definitions from
https://elixir.bootlin.com/linux/v6.15.1/source/arch/s390/include/asm/percpu.h#L21
and https://elixir.bootlin.com/linux/v6.15.1/source/arch/alpha/include/asm/percpu.h#L14
with:
#if defined(MODULE) && defined(ARCH_NEEDS_WEAK_PER_CPU)
#define MODULE_NEEDS_WEAK_PER_CPU
#endif
Then use MODULE_NEEDS_WEAK_PER_CPU instead of ARCH_NEEDS_WEAK_PER_CPU
in all the current places in the kernel code. Lastly, to compile out
_shared_alloc_tag your current patch should work fine because on s390
and alpha ARCH_NEEDS_WEAK_PER_CPU will be defined after all these
changes.
Does that make sense?
Could you please help explain it again?
Thanks
Best Regards
Hao
Thanks
Best Regards
Hao