Re: [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush work

From: Frederic Weisbecker

Date: Tue Mar 17 2026 - 09:35:18 EST

Le Sun, Mar 15, 2026 at 03:10:27PM -0300, Leonardo Bras a écrit :
> On Fri, Mar 13, 2026 at 10:55:47PM +0100, Frederic Weisbecker wrote:
> > I find this part of the semantic a bit weird. If we eventually queue
> > the work, why do we care about doing a local_lock() locally ?
>
> (Sorry, not sure if I was able to understand the question.)
>
> Local locks make sure a per-cpu procedure happens on the same CPU from
> start to end. Using migrate_disable & using per-cpu spinlocks on RT and
> doing preempt_disable in non_RT.
>
> Most of the cases happen to have the work done in the local cpu, and just
> a few procedures happen to be queued remotely, such as remote cache
> draining.
>
> Even with the new 'local_qpw_lock()' which is faster for cases we are sure
> to have local usages, on qpw=0 we have to make qpw_lock() a local_lock as
> well, as the cpu receiving the scheduled work needs to make sure to run it
> all without moving to a different cpu.

But queue_work_on() already makes sure the work doesn't move to a different CPU
(provided hotplug is correctly handled for the work).

Looks like we are both confused, so let's take a practical example. Suppose
CPU 0 queues a work to CPU 1 which sets a per-cpu variable named A to the value
"1". We want to guarantee that further reads of that per-cpu value by CPU 1
see the new value. With qpw=1, it looks like this:

CPU 0 CPU 1
----- -----

qpw_lock(CPU 1)
spin_lock(&QPW_CPU1)
qpw_queue_for(write_A, 1)
write_A()
A1 = per_cpu_ptr(&A, 1)
*A1 = 1
qpw_unlock(CPU 1)
spin_unlock(&QPW_CPU1)
read_A()
qpw_lock(CPU 1)
spin_lock(&QPW_CPU1)
r0 = __this_cpu_read(&A)
qpw_unlock(CPU 1)
spin_unlock(&QPW_CPU1)

CPU 0 took the spinlock while writing to A, so CPU 1 is guaranteed to further
observe the new value because it takes the same spinlock (r0 == 1)

Now look at the qpw=0 case:

CPU 0 CPU 1
----- -----

qpw_lock(CPU 1)
local_lock(&QPW_CPU0)
qpw_queue_for(write_A, 1)
queue_work_on(write_A, CPU 1)
qpw_unlock(CPU 1)
local_unlock(&QPW_CPU0)
// workqueue
write_A()
qpw_lock(CPU 1)
local_lock(&QPW_CPU1)
A1 = per_cpu_ptr(&A, 1)
*A1 = 1
qpw_unlock(CPU 1)
local_unlock(&QPW_CPU1)

read_A()
qpw_lock(CPU 1)
local_lock(&QPW_CPU1)
r0 = __this_cpu_read(&A)
qpw_unlock(CPU 1)
local_unlock(&QPW_CPU1)

Here CPU 0 queues the work on CPU 1 which writes and reads the new value
(r0 == 1). local_lock() / preempt_disable() makes sure the CPU doesn't change.

But what is the point in doing local_lock(&QPW_CPU0) on CPU 0 ?

> > >
> > > @@ -2840,6 +2840,16 @@ Kernel parameters
> > >
> > > The format of <cpu-list> is described above.
> > >
> > > + qpw= [KNL,SMP] Select a behavior on per-CPU resource sharing
> > > + and remote interference mechanism on a kernel built with
> > > + CONFIG_QPW.
> > > + Format: { "0" | "1" }
> > > + 0 - local_lock() + queue_work_on(remote_cpu)
> > > + 1 - spin_lock() for both local and remote operations
> > > +
> > > + Selecting 1 may be interesting for systems that want
> > > + to avoid interruption & context switches from IPIs.
> >
> > Like Vlastimil suggested, it would be better to just have it off by default
> > and turn it on only if nohz_full= is passed. Then we can consider introducing
> > the parameter later if the need arise.
>
> I agree with having it enabled with isolcpus/nohz_full, but I would
> recommend having this option anyway, as the user could disable qpw if
> wanted, or enable outside isolcpu scenarios for any reason.

Do you know any such users? Or suspect a potential usecase? If not we can still
add that option later. It's probably better than sticking with a useless
parameter that we'll have to maintain forever.

> > > +#define qpw_lockdep_assert_held(lock) \
> > > + lockdep_assert_held(lock)
> > > +
> > > +#define queue_percpu_work_on(c, wq, qpw) \
> > > + queue_work_on(c, wq, &(qpw)->work)
> >
> > qpw_queue_work_on() ?
> >
> > Perhaps even better would be qpw_queue_work_for(), leaving some room for
> > mystery about where/how the work will be executed :-)
> >
>
> QPW comes from Queue PerCPU Work
> Having it called qpw_queue_work_{on,for}() would be repetitve

Well, qpw_ just becomes the name of the subsystem and its prefix for APIs.
For example qpw_lock() doesn't mean that we queue and lock, it only means we lock.

> But having qpw_on() or qpw_for() would be misleading :)
>
> That's why I went with queue_percpu_work_on() based on how we have the
> original function (queue_work_on) being called.

That's much more misleading at it doesn't refer to qpw at all and it only
suggest that it's a queueing a per-cpu workqueue.

> > Perhaps that too should just be selected automatically by CONFIG_NO_HZ_FULL and if
> > the need arise in the future, make it visible to the user?
> >
>
> I think it would be good to have this, and let whoever is building have the
> chance to disable QPW if it doesn't work well for their machines or
> workload, without having to add a new boot parameter to continue have
> their stuff working as always after a kernel update.
>
> But that is open to discussion :)

Ok I guess we can stick with the Kconfig at least in the beginning.

Thanks.

--
Frederic Weisbecker
SUSE Labs