Re: [PATCH] sched: flush plug in schedule_preempt_disabled() to prevent deadlock

From: Michael Wu

Date: Tue May 19 2026 - 06:39:28 EST


The patch you mentioned also solves my problem; I've verified and tested it on my platform.

On 5/15/2026 2:18 PM, Xiaosen wrote:
https://lore.kernel.org/lkml/20260427183848.698551-2-jstultz@xxxxxxxxxx/
The above change can resolve the deadlock I reported before by setting
task's state to TASK_RUNNING before switching context.

There is the likely alternative fix below.
https://lore.kernel.org/lkml/20260512025635.2840817-1-jstultz@xxxxxxxxxx/

Regards,
Xiaosen

On 5/13/2026 4:08 PM, Ming Lei wrote:
On Wed, May 13, 2026 at 09:30:39AM +0200, Peter Zijlstra wrote:
On Wed, May 13, 2026 at 10:07:03AM +0800, Ming Lei wrote:
On Tue, May 12, 2026 at 07:16:36AM -1000, Tejun Heo wrote:
Hello, Ming.

On Tue, May 12, 2026 at 11:45:14PM +0800, Ming Lei wrote:
On Tue, May 12, 2026 at 02:40:21PM +0200, Peter Zijlstra wrote:
On Tue, May 12, 2026 at 02:04:32PM +0200, Peter Zijlstra wrote:
On Tue, May 12, 2026 at 04:59:39PM +0800, Ming Lei wrote:
On preemptible kernels, a deadlock can occur when a task with plugged IO
calls schedule_preempt_disabled():

schedule_preempt_disabled()
sched_preempt_enable_no_resched() // preemption now enabled
schedule() // <-- preemption can happen here
sched_submit_work()
blk_flush_plug()

After sched_preempt_enable_no_resched() re-enables preemption, the task
can be preempted (e.g., by a higher-priority RT task) before reaching
blk_flush_plug() in sched_submit_work(). Since the task's state is
already TASK_UNINTERRUPTIBLE (set by the mutex/rwsem slowpath caller),
requests in current->plug remain unflushed for an unbounded time.

If another task depends on those plugged requests to make progress (e.g.,
to release a lock the sleeping task needs), a deadlock results:

- Task A (writeback worker): holds plugged IO, preempted before
flushing, stuck on run queue behind higher-priority work
- Task B: waiting for IO completion from Task A's plug, holds a lock
that Task A needs to be woken up

My memory is hazy around io_schedule but the above reads really weird to me.
A task, regardless of its current state stays on the runqueue when
preempted, so the condition is temporary. As soon as the preempted task can
get CPU, it should unwind the situation. That's not a deadlock. Is the
problem that there can be preemption-induced delay in flushing the plugs?

IMO, preempting a `!TASK_RUNNING` task can be thought as effective sleep,

No it cannot be. Preemption ignores task state.

Yeah, I get similar conclusion too with AI's assistance.

But both two reports show that the preempted task aren't switched back for
long enough time, can you share any idea for Michael & Xiaosen to investigate
further from scheduler side?

https://lore.kernel.org/linux-block/20260417082744.30124-1-michael@xxxxxxxxxxxxxxxxx/

https://lore.kernel.org/linux-block/5660795d-87de-46f5-add4-7729a02225ef@xxxxxxxxxxxxxxxx/


Thanks,
Ming

--
Regards,
Michael Wu