Re: [PATCH v5 1/7] s390/percpu: Infrastructure for more efficient this_cpu operations

From: Heiko Carstens

Date: Mon Jun 01 2026 - 09:31:06 EST


On Mon, Jun 01, 2026 at 09:10:47AM +0200, Alexander Gordeev wrote:
> On Tue, May 26, 2026 at 07:56:56AM +0200, Heiko Carstens wrote:
> > Get rid of the conditional branch with the following code sequence:
> >
> > 11a8e6: c0 30 00 d0 c5 0d larl %r3,1b33300
>
> Annotating with similar comments as above would help:
>
> ... <- address of percpu var
> ... <- add percpu offset
>
> > 11a8ec: b9 04 00 43 lgr %r4,%r3
> > 11a8f0: eb 00 43 c0 00 52 mviy 960,4
> > 11a8f6: e3 40 03 b8 00 08 ag %r4,952
> > 11a8fc: eb 52 40 00 00 e8 laag %r5,%r2,0(%r4)
> > 11a902: eb 00 03 c0 00 52 mviy 960,0
> > 11a908: b9 08 00 25 agr %r2,%r5
> > 11a90c 07 fe br %r14

Sure.

> > +static __always_inline void percpu_entry(struct pt_regs *regs)
> > +{
> > + struct lowcore *lc = get_lowcore();
> > +
> > + if (user_mode(regs))
> > + return;
> > + regs->cpu = lc->cpu_nr;
> > + regs->percpu_register = lc->percpu_register;
> > + lc->percpu_register = 0;
>
> This assignment is required when the task is migrated to a new CPU, so
> the lowcore would not be corrupted with the stale percpu_register value.
> Otherwise, it is restored in percpu_exit() if there was no migration.
> Right?

This, but it is also required to have a "clean" starting point for the
new context (exception, irq, nmi), since the new context does not run
in a percpu code section - if that context would be interrupted then
the next context might see that the previous context was running in a
percpu code section, while it was not. Therefore lc->percpu_register
must be saved and set to 0 before the new context may use percpu ops
and/or fault.

> > + if ((insn & 0xff0f) != 0xe300)
>
> Any reason mask/not to check the register number as well?

Just to save code. If there would be a mismatch with the percpu
register, something serious would be wrong..
But that cannot happen(tm) :)

> > + /* Check if process has been migrated to a different CPU. */
> > + if (regs->cpu == lc->cpu_nr)
> > + return;
> > + /* Fixup percpu base register */
> > + regs->gprs[reg] -= __per_cpu_offset[regs->cpu];
> > + regs->gprs[reg] += lc->percpu_offset;
>
> Such one reads bit better:
>
> regs->gprs[reg] -= __per_cpu_offset[regs->cpu];
> regs->gprs[reg] += __per_cpu_offset[reg];

That reads better, but it is wrong. 'reg' is not the index of the
current cpu into __per_cpu_offset[].

It could be changed to

regs->gprs[reg] += __per_cpu_offset[lc->cpu_nr];

but then again that's exactly lc->percpu_offset. I don't see a reason
to create worse code here.

> > + * Inline assemblies making use of this typically have a code sequence like:
> > + *
> > + * MVIY_PERCPU(...) <- start of percpu code section
> > + * AG_ALT(...) <- add percpu offset; must be the second instruction
> > + * atomic_op <- atomic op
> \t here, but should be spaces?

I can't follow? We have tabs in comments all over the place in s390 code.

> Probably it worth noting that no extra instructions between atomic_op
> and MVIY_ALT may exist (e.g. in the future), especially ones that use
> the percpu_register.

That's not true. There may be additional instructions, they may just
not use the same register that is used for the percpu variable.

But that was true before this patch as well, even though it might not
have been "obvious".

> > unsigned long flags;
> > unsigned long last_break;
> > + unsigned int cpu;
> > + unsigned char percpu_register;
>
> May be name it as this_cpu and this_cpu_reg(ister)?

Here I disagree. *If* a task would be migrated then such naming would
be very confusing: regs->this_cpu would contain the cpu number of the
_previous_ cpu where the task was running on, but not the number of
the current cpu.
So you would end up with checks like:

if (regs->this_cpu != this_cpu)

to figure out if a task was migrated. This doesn't look right to me.