[PATCH v2 1/4] workqueue: park kicked worker on pool->kicked_list

From: Breno Leitao

Date: Wed Jun 03 2026 - 09:53:06 EST


kick_pool() picks an idle worker and wakes it, but leaves it WORKER_IDLE
on pool->idle_list until the woken kthread schedules in and runs
worker_leave_idle(). idle_cull_fn() only checks WORKER_IDLE, not the
task state, so a kicked-but-not-yet-scheduled worker is still a valid
cull victim -- the cull can reap it before it consumes the just-enqueued
work, stranding the item. The window is narrow today but later patches
in this series defer the wakeup outside pool->lock, widening it.

Move the picked worker from pool->idle_list to a new pool->kicked_list
under pool->lock so the cull path -- which walks idle_list only --
cannot reach it. worker_leave_idle() already does list_del_init(), so
it correctly removes the worker from kicked_list when it actually runs;
worker_enter_idle() puts it back onto idle_list on completion. No
extra list ops on the worker side.

LIFO coalescing of back-to-back kicks onto the same cache-hot worker is
preserved by having first_idle_worker() peek kicked_list before
idle_list: the second kick lands on the already-kicked worker, the
duplicate wakeup is a no-op, and the worker drains both items when it
runs.

Why not creating a new WORKER_KICKED flag instead, you might ask. I've
tried it and the numbers decreased.

Compared to tagging the worker with a new WORKER_KICKED flag,
list_move() writes to worker->entry (offset 0 of struct worker), which
the producer already dirties when reading the idle_list head; no new
cross-CPU cacheline is introduced. Tagging worker->flags would have put
a producer-side write on an otherwise worker-private cacheline, causing
a coherence bounce on every kick.

Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
---
kernel/workqueue.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8df671066dd1..b3f8b86cb52f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -216,12 +216,13 @@ struct worker_pool {
int nr_idle; /* L: currently idle workers */

struct list_head idle_list; /* L: list of idle workers */
+ struct list_head kicked_list; /* L: workers kicked but not yet running */
struct timer_list idle_timer; /* L: worker idle timeout */
struct work_struct idle_cull_work; /* L: worker idle cleanup */

struct timer_list mayday_timer; /* L: SOS timer for workers */

- /* a workers is either on busy_hash or idle_list, or the manager */
+ /* a worker is either on busy_hash, idle_list, kicked_list, or the manager */
DECLARE_HASHTABLE(busy_hash, BUSY_WORKER_HASH_ORDER);
/* L: hash of busy workers */

@@ -1031,6 +1032,13 @@ static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
/* Return the first idle worker. Called with pool->lock held. */
static struct worker *first_idle_worker(struct worker_pool *pool)
{
+ /*
+ * Prefer an already-kicked worker so back-to-back kicks coalesce
+ * onto the same cache-hot worker (LIFO reuse).
+ */
+ if (!list_empty(&pool->kicked_list))
+ return list_first_entry(&pool->kicked_list, struct worker, entry);
+
if (unlikely(list_empty(&pool->idle_list)))
return NULL;

@@ -1310,6 +1318,16 @@ static bool kick_pool(struct worker_pool *pool)
}
}
#endif
+ /*
+ * Move @worker to pool->kicked_list so a concurrent idle_cull_fn()
+ * (which only walks pool->idle_list) cannot reap it before it
+ * consumes the just-enqueued work. worker_leave_idle() removes the
+ * worker from whichever list it sits on; worker_enter_idle() puts
+ * it back on pool->idle_list on completion. first_idle_worker()
+ * peeks kicked_list first, so back-to-back kicks still coalesce
+ * onto the same cache-hot worker (LIFO reuse).
+ */
+ list_move(&worker->entry, &pool->kicked_list);
wake_up_process(p);
return true;
}
@@ -4896,6 +4914,7 @@ static int init_worker_pool(struct worker_pool *pool)
pool->last_progress_ts = jiffies;
INIT_LIST_HEAD(&pool->worklist);
INIT_LIST_HEAD(&pool->idle_list);
+ INIT_LIST_HEAD(&pool->kicked_list);
hash_init(pool->busy_hash);

timer_setup(&pool->idle_timer, idle_worker_timeout, TIMER_DEFERRABLE);

--
2.54.0