Re: [PATCH 5/5] cgroup: Defer kill_css_finish() in cgroup_apply_control_disable()

From: Bert Karwatzki

Date: Sun May 31 2026 - 05:25:47 EST


Am Freitag, dem 29.05.2026 um 22:08 +0100 schrieb Mark Brown:
> On Fri, May 29, 2026 at 07:25:29AM -1000, Tejun Heo wrote:
> > On Wed, May 27, 2026 at 11:45:54AM +0100, Mark Brown wrote:
> > > On Mon, May 04, 2026 at 02:51:21PM -1000, Tejun Heo wrote:
>
> > > with no further output and given that this is a cgroup locking change
> > > this does seem like a plausible commmit, though I didn't look into it in
> > > detail. Bisect log and the list of LTP tests we're running in our test
> > > job below. We are running multuple tests in parallel.
>
> > Unfortunately, I can't reproduce this in my environment. Any chance you can
> > try testing on x86 tooa nd see whether it produces there?
>
> Not readily sadly, I'll see if I can figure something out. Our rootfs
> images are based on Debian Trixie if that's relevant?

Using debian unstable (sid/forky) I can at least detect a timeout when running
the ltp controller testsuite:

# LTPROOT=/home/bert/ltp-install/ ./kirk --run-suite controllers
Host information
Hostname: homer
Python: 3.13.12 (main, Feb 4 2026, 15:06:39) [GCC 15.2.0]
Directory: /tmp/kirk.root/tmp092in2yb

Connecting to SUT: default

Suite: controllers
──────────────────
cgroup_core01: pass (0.024s)
cgroup_core02: pass (0.004s)
cgroup_core03: pass (0.017s)
cgroup: skip (2m 41s)
memcg_regression: skip (3.414s)
memcg_test_3: pass (0.090s)
memcg_failcnt: skip (0.019s)
memcg_force_empty: skip (0.015s)
memcg_limit_in_bytes: skip (0.017s)
memcg_stat_rss: skip (0.015s)
memcg_subgroup_charge: skip (0.015s)
memcg_max_usage_in_bytes: skip (0.014s)
memcg_move_charge_at_immigrate: skip (0.014s)
memcg_memsw_limit_in_bytes: skip (0.015s)
memcg_stat: skip (0.015s)
memcg_use_hierarchy: skip (0.015s)
memcg_usage_in_bytes: skip (0.014s)
memcg_stress: pass (30m 4s)
memcg_control: pass (6.058s)
memcontrol01: pass (0.004s)
memcontrol02: pass (0.636s)
memcontrol03: pass (15.983s)
memcontrol04: pass (0.890s)
cgroup_fj_function_debug: skip (0.013s)
cgroup_fj_function_cpuset: skip (0.044s)
cgroup_fj_function_cpu: skip (0.050s)
cgroup_fj_function_cpuacct: pass (0.052s)
cgroup_fj_function_memory: skip (0.042s)
cgroup_fj_function_freezer: pass (0.044s)
cgroup_fj_function_devices: pass (0.066s)
cgroup_fj_function_blkio: skip (0.009s)
cgroup_fj_function_net_cls: pass (0.073s)
cgroup_fj_function_perf_event: pass (0.072s)
cgroup_fj_function_net_prio: Suite 'controllers' timed out after 3600 seconds

Execution time: 1h 33m 13s

Disconnecting from SUT: default

Target information
──────────────────
Kernel: Linux 7.1.0-rc5-next-20260528-master-dirty #480 SMP PREEMPT_RT Thu May 28 19:55:12 CEST 2026
Cmdline: BOOT_IMAGE=/boot/vmlinuz-7.1.0-rc5-next-20260528-master-dirty
root=UUID=3d5cdc5d-1902-40bf-9e16-ca819372d350
ro
quiet
Machine: unknown
Arch: x86_64
RAM: 63439380 kB
Swap: 78125052 kB
Distro: debian

────────────────────────
TEST SUMMARY
────────────────────────
Suite: controllers
Runtime: 33m 13s
Runs: 347

Results:
Passed: 181
Failed: 0
Broken: 0
Skipped: 350
Warnings: 0

Session stopped

In dmesg I get messages about task tst_cgtl hanging:

[ 2212.794669] [ T346] INFO: task tst_cgctl:317896 blocked for more than 122 seconds.
[ 2212.794674] [ T346] Not tainted 7.1.0-rc5-next-20260528-master-dirty #480
[ 2212.794675] [ T346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[...]

[ 3318.721344] [ T346] INFO: task tst_cgctl:317896 blocked for more than 1228 seconds.
[ 3318.721349] [ T346] Not tainted 7.1.0-rc5-next-20260528-master-dirty #480
[ 3318.721351] [ T346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.






On 6.19.14 the Results of this testrun is:

# LTPROOT=/home/bert/ltp-install/ ./kirk --run-suite controllers

[...]

Target information
──────────────────
Kernel: Linux 6.19.14-stable #1238 SMP PREEMPT_RT Sat May 30 17:28:29 CEST 2026
Cmdline: BOOT_IMAGE=/boot/vmlinuz-6.19.14-stable
root=UUID=3d5cdc5d-1902-40bf-9e16-ca819372d350
ro
quiet
Machine: unknown
Arch: x86_64
RAM: 63436188 kB
Swap: 78125052 kB
Distro: debian

────────────────────────
TEST SUMMARY
────────────────────────
Suite: controllers
Runtime: 36m 12s
Runs: 347

Results:
Passed: 1742
Failed: 0
Broken: 0
Skipped: 97
Warnings: 0

Session stopped

With 6.19.14 I also get no hung tasks.

On 7.0.10 the tests also work:

root@homer:/mnt/data/linux-forest/kirk# LTPROOT=/home/bert/ltp-install/ ./kirk --run-suite controllers
Host information
Hostname: homer
Python: 3.13.12 (main, Feb 4 2026, 15:06:39) [GCC 15.2.0]
Directory: /tmp/kirk.root/tmpq32b09g7

Connecting to SUT: default

Suite: controllers
──────────────────
cgroup_core01: pass (0.016s)

[...]

pids_9_100: pass (0.107s)

Execution time: 36m 15s

Disconnecting from SUT: default

Target information
──────────────────
Kernel: Linux 7.0.10-stable #1239 SMP PREEMPT_RT Sun May 31 00:42:41 CEST 2026
Cmdline: BOOT_IMAGE=/boot/vmlinuz-7.0.10-stable
root=UUID=3d5cdc5d-1902-40bf-9e16-ca819372d350
ro
quiet
Machine: unknown
Arch: x86_64
RAM: 63435940 kB
Swap: 78125052 kB
Distro: debian

────────────────────────
TEST SUMMARY
────────────────────────
Suite: controllers
Runtime: 36m 13s
Runs: 347

Results:
Passed: 1742
Failed: 0
Broken: 0
Skipped: 97
Warnings: 0

Session stopped



I'm not sure if this is related to the problems on arm64, but I'll try bisecting this.

Bert Karwatzki