[PATCH v2 4/4] workqueue: defer the worker wakeup outside pool->lock in process_one_work()

From: Breno Leitao

Date: Wed Jun 03 2026 - 09:49:59 EST


process_one_work() kicks the pool to chain execution of the remaining
work items on WORKER_NOT_RUNNING pools (the UNBOUND and CPU_INTENSIVE
ones), calling kick_pool() while holding pool->lock. As in the enqueue
path, the wakeup pulls the target rq->lock in under pool->lock.

Use kick_pool_pick() to select and claim the worker under pool->lock and
issue the wakeup with wake_up_q() after the lock is dropped via
raw_spin_unlock_irq_wake().

With both hot paths converted, measured on a CONFIG_SMP x86 VM (8 vCPUs)
with the in-tree test_workqueue benchmark (lib/test_workqueue.c; each of
8 producers queues 200000 work items one at a time on a WQ_UNBOUND
workqueue, waiting for each to complete), medians of five boots per
scope:

affinity_scope baseline patched tput p95
(items/s) (items/s) gain drop
-------------- --------- --------- ------ ------
cpu 3,611,591 3,568,433 -1.2% +4.6%
smt 3,601,697 3,550,632 -1.4% +6.1%
cache_shard 341,913 401,213 +17.3% -36.8%
cache 320,607 400,560 +24.9% -41.9%
numa 324,909 389,202 +19.8% -38.0%
system 314,510 392,278 +24.7% -37.5%

(p95 drop is the change in the p95 enqueue latency; negative is better.)

cpu/smt use per-CPU pools with no producer/consumer contention and are
essentially unchanged. On the contended scopes the shorter pool->lock
hold time cuts p95 enqueue latency by ~40%, and because this workload is
bound by the producer<->worker round-trip that latency reduction also
lifts throughput by ~20%.

Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
---
kernel/workqueue.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b4246a801dd8..238b02edd01d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3261,6 +3261,7 @@ __acquires(&pool->lock)
{
struct pool_workqueue *pwq = get_work_pwq(work);
struct worker_pool *pool = worker->pool;
+ DEFINE_WAKE_Q(wakeq);
unsigned long work_data;
int lockdep_start_depth, rcu_start_depth;
bool bh_draining = pool->flags & POOL_BH_DRAINING;
@@ -3315,7 +3316,7 @@ __acquires(&pool->lock)
* chain execution of the pending work items for WORKER_NOT_RUNNING
* workers such as the UNBOUND and CPU_INTENSIVE ones.
*/
- kick_pool(pool);
+ kick_pool_pick(pool, &wakeq);

/*
* Record the last pool and clear PENDING which should be the last
@@ -3326,7 +3327,8 @@ __acquires(&pool->lock)
set_work_pool_and_clear_pending(work, pool->id, pool_offq_flags(pool));

pwq->stats[PWQ_STAT_STARTED]++;
- raw_spin_unlock_irq(&pool->lock);
+ /* deferred kick_pool_pick() wakeup, issued outside pool->lock */
+ raw_spin_unlock_irq_wake(&pool->lock, &wakeq);

rcu_start_depth = rcu_preempt_depth();
lockdep_start_depth = lockdep_depth(current);

--
2.54.0