Re: [QUESTION/REGRESSION] Unbound kthreads scheduled on nohz_full CPUs after commit 041ee6f3727a

From: Waiman Long

Date: Sun Mar 22 2026 - 17:18:47 EST


On 3/22/26 1:48 PM, sheviks wrote:
Hi Frederic, Waiman and maintainers,

A quick follow-up on my previous report. After leaving the system idle
for a longer period, I made an observation that pinpoints the issue
more precisely.

The cgroup v2 dynamic isolation does eventually work. I noticed that
the unbound kthreads are eventually migrated to the housekeeping CPU
(CPU 0), but only after they wake up from sleep and enter the running
state. This lazy migration highlights why commit 041ee6f3727a is
causing issues for setups using nohz_full= without isolcpus=:

1. At boot time, because isolcpus= is absent, the HK_TYPE_DOMAIN mask
includes the nohz_full CPUs.

2. When unbound kthreads are initially spawned or have their affinity
set, the new logic relies solely on HK_TYPE_DOMAIN. Consequently, they
are placed on the nohz_full CPUs and immediately go to sleep there.

3. They remain "trapped" on the isolated CPUs until a wake-up event
finally forces the scheduler to migrate them according to the updated
cgroup affinity.
That makes much more sense now. Yes, this is what new behavior should be.
This brings the focus back to HK_TYPE_KTHREAD vs HK_TYPE_DOMAIN. While
HK_TYPE_DOMAIN might default to all CPUs without isolcpus=,
HK_TYPE_KTHREAD correctly excludes the nohz_full CPUs from the very
beginning.

Is this "spawn on nohz_full and wait for wake-up to migrate" behavior
intended? To prevent these sleeping kthreads from polluting isolated
CPUs before cgroups can intervene, should the initial affinity check
still consider HK_TYPE_KTHREAD alongside or instead of HK_TYPE_DOMAIN?

The plan is to make nohz_full also dynamically changeable at run time in the future. We are not there yet. Now HK_TYPE_KTHREAD is equivalent to HK_TYPE_DOMAIN.

A CPU cannot be considered fully isolated if it is in either HK_TYPE_DOMAIN or HK_TYPE_KERNEL_NOISE. Currently, the nohz_full kernel parameter can be used to put a set of partially isolated CPUs in the nohz_full reservoir. To fully isolate them, some or all of them will be need to be put in an isolated cpuset partition. BTW, HK_TYPE_MANAGED_IRQ will be made to be dynamic too.

That is my current view.

Cheers,
Longman