Re: [QUESTION/REGRESSION] Unbound kthreads scheduled on nohz_full CPUs after commit 041ee6f3727a

From: sheviks

Date: Sun Mar 22 2026 - 13:48:29 EST


Hi Frederic, Waiman and maintainers,

A quick follow-up on my previous report. After leaving the system idle
for a longer period, I made an observation that pinpoints the issue
more precisely.

The cgroup v2 dynamic isolation does eventually work. I noticed that
the unbound kthreads are eventually migrated to the housekeeping CPU
(CPU 0), but only after they wake up from sleep and enter the running
state. This lazy migration highlights why commit 041ee6f3727a is
causing issues for setups using nohz_full= without isolcpus=:

1. At boot time, because isolcpus= is absent, the HK_TYPE_DOMAIN mask
includes the nohz_full CPUs.

2. When unbound kthreads are initially spawned or have their affinity
set, the new logic relies solely on HK_TYPE_DOMAIN. Consequently, they
are placed on the nohz_full CPUs and immediately go to sleep there.

3. They remain "trapped" on the isolated CPUs until a wake-up event
finally forces the scheduler to migrate them according to the updated
cgroup affinity.

This brings the focus back to HK_TYPE_KTHREAD vs HK_TYPE_DOMAIN. While
HK_TYPE_DOMAIN might default to all CPUs without isolcpus=,
HK_TYPE_KTHREAD correctly excludes the nohz_full CPUs from the very
beginning.

Is this "spawn on nohz_full and wait for wake-up to migrate" behavior
intended? To prevent these sleeping kthreads from polluting isolated
CPUs before cgroups can intervene, should the initial affinity check
still consider HK_TYPE_KTHREAD alongside or instead of HK_TYPE_DOMAIN?

Thanks again for your time.

Best regards,
Sheviks


Waiman Long <longman@xxxxxxxxxx> 於 2026年3月23日週一 上午12:04寫道:
>
> On 3/21/26 12:44 AM, sheviks wrote:
> > Hi Frederic and maintainers,
> >
> > I’m reaching out to discuss a change in kthread affinity behavior that
> > seems to be a regression for users relying on dynamic CPU isolation.
> > This started appearing after commit 041ee6f3727a ("kthread: Rely on
> > HK_TYPE_DOMAIN for preferred affinity management").
> >
> > The Problem:
> > In my setup, I use nohz_full but intentionally avoid the deprecated
> > isolcpus= boot parameter. Instead, I use cgroup v2
> > (cpuset.cpus.partition=isolated) to dynamically isolate CPUs after the
> > system has booted.
> >
> > The commit 041ee6f3727a changed kthreads to rely on HK_TYPE_DOMAIN.
> > However, since isolcpus= is not used, HK_TYPE_DOMAIN defaults to all
> > CPUs at boot time. Even after I later configure cgroups to isolate
> > CPUs 1-7, unbound kthreads (including kthreadd) remain on those
> > nohz_full CPUs.
>
> Frederic's patch series is supposed to make HK_TYPE_DOMAIN cpumask to
> dynamically exclude cpuset isolated CPUs. Then those unbound kthreads
> are supposed to be modified to remove these CPUs from their cpumasks. If
> that is not happening, it will be a problem we need to look at.
>
> Cheers,
> Longman
>
> >
> > It seems the assumption that "nohz_full implies domain isolation" only
> > holds true if isolation is statically defined at boot via isolcpus=.
> > For dynamic isolation via cgroups, HK_TYPE_KTHREAD and HK_TYPE_DOMAIN
> > no longer cover the same set of CPUs.
> >
> > System Log:
> > Here is the state of my system after setting up the cgroup isolation:
> >
> > $ uname -r
> > 7.0.0-rc4-1-rt
> >
> > # 1. Boot parameters (No isolcpus)
> > $ grep -oe "nohz_full=[^ ]*" -e "rcu_nocbs=[^ ]*" /proc/cmdline
> > nohz_full=1,2,3,4,5,6,7
> > rcu_nocbs=1,2,3,4,5,6,7
> >
> > # 2. Cgroup v2 Isolation is active
> > $ cat /sys/fs/cgroup/isolated1.slice/cpuset.cpus.exclusive
> > 1-7
> > $ cat /sys/fs/cgroup/isolated1.slice/cpuset.cpus.partition
> > isolated
> > $ cat /sys/fs/cgroup/cpuset.cpus.effective
> > 0
> > $ cat /sys/fs/cgroup/cpuset.cpus.isolated
> > 1-7
> >
> > # 3. Unbound kthreads are still "trapped" on isolated/nohz_full CPUs
> > $ ps -eLo cpuid,comm | grep -e COMM -e "^ *[1-7] " | grep -ve
> > "/[1-7]$" -e "kworker/[1-7]:" | head
> > CPUID COMMAND
> > 4 pool_workqueue_release
> > 1 pr/legacy
> > 4 rcu_exp_gp_kthread_worker
> > 1 kdevtmpfs
> > 5 oom_reaper
> > 1 ksmd
> > 7 watchdogd
> > 7 kswapd0
> > 6 scsi_eh_0
> >
> > Questions:
> > 1. Is this an intended change that mandates the use of isolcpus= for
> > kthread exclusion?
> >
> > 2. If we prefer dynamic isolation via cgroup v2, is there a
> > recommended way to "refresh" or move these unbound kthreads once the
> > housekeeping mask changes at runtime?
> >
> > 3. Or should HK_TYPE_KTHREAD still be considered separately from
> > HK_TYPE_DOMAIN to account for nohz_full users without isolcpus=?
> >
> > I would appreciate any insights or suggestions you might have.
> >
> > Best regards,
> > Sheviks
> >
> > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> > 乾淨無病毒。www.avast.com
> > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >
>