Re: [PATCH v3 10/12] x86/mm: Move flush_tlb_info back to the stack

From: Sebastian Andrzej Siewior

Date: Thu Mar 19 2026 - 04:51:21 EST


On 2026-03-19 00:28:19 [+0200], Nadav Amit wrote:
> >> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> >> index 5a3cdc439e38d..4a7f40c7f939a 100644
> >> --- a/arch/x86/include/asm/tlbflush.h
> >> +++ b/arch/x86/include/asm/tlbflush.h
> >> @@ -227,7 +227,7 @@ struct flush_tlb_info {
> >> u8 stride_shift;
> >> u8 freed_tables;
> >> u8 trim_cpumask;
> >> -};
> >> +} __aligned(SMP_CACHE_BYTES);
> >>
> >
> > This would work, but you are likely to encounter the same problem PeterZ hit
> > when I did something similar: in some configurations SMP_CACHE_BYTES is very
> > large.

So if capping to 64 is an option does not break performance where one
would complain, why not. But you did it initially so…

> Further thinking about it and looking at the rest of the series: wouldn’t it be
> simpler to put flush_tlb_info and smp_call_function_many_cond()’s
> cpumask on thread_struct? It would allow to support CONFIG_CPUMASK_OFFSTACK=y
> case by preallocating cpumask on thread creation.
>
> I’m not sure whether the memory overhead is prohibitive.

My Debian config has CONFIG_NR_CPUS=8192 which would add 1KiB if we add
a plain cpumask_t. The allocation based on cpumask_size() would add just
8 bytes/ pointer to the struct which should be fine. We could even stash
the mask in the pointer for CPUs <= 64 on 64bit.
On RT it would be desired to have the memory and not to fallback to
waiting with disabled preemption if the allocation fails.

The flush_tlb_info are around 40 bytes + alignment. Maybe we could try
stack first if this gets us to acceptable performance.

Sebastian