Re: [PATCH-tip v3] debugobjects: Don't call fill_pool() in early boot non-task context
From: Waiman Long
Date: Wed Jun 03 2026 - 11:08:03 EST
On 6/3/26 3:56 AM, Sebastian Andrzej Siewior wrote:
On 2026-05-20 16:15:09 [-0400], Waiman Long wrote:Yes, the debug_objects_is_pi_blocked_on() check should cover the system_state check as well.
When booting a debug PREEMPT_RT kernel on an arm64 system with graceWhat about:
processor, the following lockdep warning was reported during early boot.
================================
WARNING: inconsistent lock state
7.1.0-rc4-test+ #1 Not tainted
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
ffff0000803346a0 (&n->list_lock){?.+.}-{3:3}, at: get_from_partial_node+0x74/0xa0
:
Call trace:
:
rt_spin_lock+0xa0/0x400
get_from_partial_node+0x74/0xa0
___slab_alloc+0x94/0x4f8
kmem_cache_alloc_noprof+0x2d4/0x598
kmem_alloc_batch+0x54/0x170
fill_pool+0x12c/0x438
debug_objects_fill_pool+0x58/0x60
debug_object_activate+0xfc/0x3d0
add_timer_on+0x250/0x3a0
add_interrupt_randomness+0x2d4/0x340
handle_percpu_devid_irq+0x2e0/0x4e0
handle_irq_desc+0xc0/0x120
generic_handle_domain_irq+0x20/0x40
__gic_handle_irq_from_irqson.isra.0+0x3c4/0x708
gic_handle_irq+0x7c/0xe0
call_on_irq_stack+0x30/0x48
do_interrupt_handler+0x134/0x158
el1_interrupt+0x48/0xb0
:
During early boot, interrupts are getting enabled before the scheduler
is enabled. In this window (before SYSTEM_SCHEDULING is set) interrupts
can fire and attempt to fill the pool from within the hardirq. This can
lead to a deadlock the interrupt occurred while in the memory allocator.
Reorder the exception rule and forbid this scenario by excluding
allocations from hardirq.
…
Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING")…
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
/*I updated the comment to explain in more verbose why this and that is
* On RT enabled kernels the pool refill must happen in preemptible
- * context and not enqueued on an rt_mutex -- for !RT kernels we rely
- * on the fact that spinlock_t and raw_spinlock_t are basically the
- * same type and this lock-type inversion works just fine.
+ * context and not enqueued on an rt_mutex or in task context during
+ * early boot before scheduling starts.
+ *
+ * For !RT kernels we rely on the fact that spinlock_t and
+ * raw_spinlock_t are basically the same type and this lock-type
+ * inversion works just fine.
*/
- if (!IS_ENABLED(CONFIG_PREEMPT_RT) || system_state < SYSTEM_SCHEDULING ||
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT) ||
+ (system_state < SYSTEM_SCHEDULING && in_task()) ||
(preemptible() && !debug_objects_is_pi_blocked_on())) {
/*
* Annotate away the spinlock_t inside raw_spinlock_t warning
done.
I re-ordered the whole thing stared with the pi-locked-on part since
this is always valid. It shouldn't happen during early boot I think it
is easier to read that way. Then we restrict it to the preeptible case
which can be overruled with the SYSTEM_SCHEDULING exception however as
long as it is not an hardirq. It looks easier to parse and hopefully
brings an end to this.
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index b18a682fe3da2..2adfe2a79a086 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -736,12 +736,17 @@ static void debug_objects_fill_pool(void)
/*
* On RT enabled kernels the pool refill must happen in preemptible
- * context and not enqueued on an rt_mutex -- for !RT kernels we rely
- * on the fact that spinlock_t and raw_spinlock_t are basically the
- * same type and this lock-type inversion works just fine.
+ * context and not while blocking on a lock which can trigger recursion
+ * during PI. During system boot (before scheduling) preemption is
+ * disabled and the pool gets exhausted. Without scheduling a deadlock
+ * is not possible if allocations from interrupt context are excluded.
+ * For !RT kernels we rely on the fact that spinlock_t and
+ * raw_spinlock_t are basically the same type and this lock-type
+ * inversion works just fine.
*/
- if (!IS_ENABLED(CONFIG_PREEMPT_RT) || system_state < SYSTEM_SCHEDULING ||
- (preemptible() && !debug_objects_is_pi_blocked_on())) {
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT) ||
+ !debug_objects_is_pi_blocked_on() &&
+ (preemptible() || (system_state < SYSTEM_SCHEDULING && !in_hardirq()))) {
/*
* Annotate away the spinlock_t inside raw_spinlock_t warning
* by temporarily raising the wait-type to LD_WAIT_CONFIG, matching
I guess softirq won't be active during early boot. If in_nmi() is true, we are screwed for non-RT kernel as well. So only checking for in_hardirq() should be fine. I will adopt your suggestion and send a new version.
Thanks,
Longman
Sebastian