Re: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking

From: Mark Rutland

Date: Wed Mar 25 2026 - 07:06:22 EST

On Sun, Mar 22, 2026 at 12:25:06AM +0100, Thomas Gleixner wrote:
> On Fri, Mar 20 2026 at 17:31, Mark Rutland wrote:
> > On Fri, Mar 20, 2026 at 05:26:47PM +0100, Thomas Gleixner wrote:
> > I *think* you're saying that because the arch code would manage DAIF
> > early during entry and late during exit, that would all be in one place.
>
> That was my thought, see below.
>
> > However, that doubles the number of times we have to write to DAIF: at
> > entry we'd have to poke it once to unmask everything except IRQs, then
> > again to unmask IRQs, and exit would need the inverse. We'd also have to
> > split the inheritance logic into inherit-everything-but-interrupt and
> > inherit-only-interrupt, which is doable but not ideal. With pseudo-NMI
> > it's even worse, but that's largely because pseudo-NMI is
> > over-complicated today.
>
> Interrupts are not unmasked on interrupt/exception entry ever and I
> don't understand your concerns at all, but as always I might be missing
> something.

I think one problem is that we're using the same words to describe
distinct things, because the terminology is overloaded; I've tried to
clarify some of that below.

I think another is that there are a large number of interacting
constraints, and it's easily possible to find something that for most
but not all of those. I *think* there's an approach that satisfies all
of our requirements; see below where I say "I *think* what would work
for us ...".

For context, when I said "at entry" and "at exit" I was including
*everything* we have to do before/after the "real" logic to handle an
exception, including parts that surround the generic entry code. I'm
happy to use a different term for those periods, but I can't immediately
think of something better.

For example, for faults handled by arm64's el1_abort(), I was
characterising this as:

/* -------- "entry" begins here -------- */

[[ entry asm ]]
[[ early triage, branch to el1_abort() ]]

static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr)
{
unsigned long far = read_sysreg(far_el1);
irqentry_state_t state;

state = enter_from_kernel_mode(regs) {
...
irqentry_enter(regs);
...
}
local_daif_inherit(regs); // <----------- might unmask interrupts
/* -------- "entry" ends here ---------- */

/* -------- "real logic" begins here --- */
do_mem_abort(far, esr, regs);
/* -------- "real logic" ends here ----- */

/* -------- "exit" begins here --------- */
local_daif_mask(); // <------------------------- masks interrupts
exit_to_kernel_mode(regs, state) {
...
irqentry_exit(regs);
...
}
}

[[ return from el1_abort() ]]
[[ exit asm ]]

/* -------- "exit" ends here ----------- */

Please note, el1_abort() is indicative of what arm64 does for
(most) synchronous exceptions, but there are slight differences for
other exceptions. For anything maskable (including interupts) we DO NOT
use local_daif_inherit() and instead unmask specific higher-priority
maskable exceptions via other functions that write to DAIF, etc.

Interrupts are an odd middle ground where we use irqentry_{enter,exit}()
but do not use local_daif_inherit().

> The current sequence on entry is:
>
> // interrupts are disabled by interrupt/exception entry
> enter_from_kernel_mode()
> irqentry_enter(regs);
> mte_check_tfsr_entry();
> mte_disable_tco_entry();
> daif_inherit(regs);
> // interrupts are still disabled

That last comment isn't quite right: we CAN and WILL enable interrupts
in local_daif_inherit(), if and only if they were enabled in the context
the exception was taken from.

As mentioned above, when handling an interrupt (rather than a
synchronous exception), we don't use local_daif_inherit(), and instead
use a different DAIF function to unmask everything except interrupts.

> which then becomes:
>
> // interrupts are disabled by interrupt/exception entry
> irqentry_enter(regs)
> establish_state();
> // RCU is watching
> arch_irqentry_enter_rcu()
> mte_check_tfsr_entry();
> mte_disable_tco_entry();
> daif_inherit(regs);
> // interrupts are still disabled
>
> Which is equivalent versus the MTE/DAIF requirements, no?

As above, we can't use local_daif_inherit() here because we want
different DAIF masking behavior for entry to interrupts and entry to
synchronous exceptions. While we could pass some token around to
determine the behaviour dynamically, that's less clear, more
complicated, and results in worse code being generated for something we
know at compile time.

If we can leave DAIF masked early on during irqentry_enter(), I don't
see why we can't leave all DAIF exceptions masked until the end of
irqentry_enter().

I *think* what would work for us is we could split some of the exit
handling (including involuntary preemption) into a "prepare" step, as we
have for return to userspace. That way, arm64 could handle exiting
something like:

local_irq_disable();
irqentry_exit_prepare(); // new, all generic logic
local_daif_mask();
arm64_exit_to_kernel_mode() {
...
irqentry_exit(); // ideally irqentry_exit_to_kernel_mode().
...
}

... and other architectures can use a combined exit_to_kernel_mode() (or
whatever we call that), which does both, e.g.

// either noinstr, __always_inline, or a macro
void irqentry_prepare_and_exit(void)
{
irqentry_exit_prepare();
irqentry_exit();
}

That way:

* There's a clear separation between the "prepare" and subsequent exit
steps, which minimizes the risk that a change subtly breaks arm64's
exception masking.

* This would mirror the userspace return path, and so would be more
consistent over all.

* All of arm64's arch-specific exception masking can live in
arch/arm64/kernel/entry-common.c, explicit in the straight line code
rather than partially hidden behind arch_*() callbacks.

* There's no unnecessary cost to other architectures.

* There's no/minimal maintenance cost for the generic code. There are no
arch_*() callbacks, and we'd have to enforce ordering between the
prepare/exit steps anyhow...

If you don't see an obvious problem with that, I will go and try that
now.

> The current broken sequence vs. preemption on exit is:
>
> // interrupts are disabled
> exit_to_kernel_mode
> daif_mask();
> mte_check_tfsr_exit();
> irqentry_exit(regs, state);
>
> which then becomes:
>
> // interrupts are disabled
> irqentry_exit(regs, state)
> // includes preemption
> prepare_for_exit();
>
> // RCU is still watching
> arch_irqentry_ecit_rcu()
> daif_mask();
> mte_check_tfsr_exit();
>
> if (state.exit_rcu)
> ct_irq_exit();

As above, I'd strongly prefer if we could pull the "prepare" step out of
irqentry_exit(). Especially since for the entry path we can't push the
DAIF masking into irqentry_enter(), and I'd very strongly prefer that
the masking and unmasking occur in the same logical place, rather than
having one of those hidden behind an arch_() callback.

> Which is equivalent versus the MTE/DAIF requirements and fixes the
> preempt on exit issue too, no?
>
> That change would be trivial enough for backporting, right?
>
> It also prevents you from staring at the bug reports which are going to
> end up in your mailbox after I merged the patch which moves the
> misplaced rcu_irq_exit_check_preempt() check _before_ the
> preempt_count() check where it belongs.

I intend to fix that issue, so hopefully I'm not staring at those for
long.

Just to check, do you mean that you've already queued that (I didn't
spot it in tip), or that you intend to? I'll happily test/review/ack a
patch adding that, but hopefully we can fix arm64 first.

> I fully agree that ARM64 is special vs. CPU state handling, but it's not
> special enough to justify it's own semantically broken preemption logic.

Sure. To be clear, I'm not arguing for broken preemption logic. I'd
asked those initial two questions because I suspected this approach
wasn't quite right.

As above, I think we can solve this in an actually generic way by
splitting out a "prepare to exit" step, and still keep the bulk of the
logic generic.

> Looking at those details made me also look at this magic
> arch_irqentry_exit_need_resched() inline function.

I see per your other reply that you figured out this part was ok:

https://lore.kernel.org/linux-arm-kernel/87se9ph129.ffs@tglx/

... though I agree we can clean that up further.

Mark.