Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()

From: Huacai Chen

Date: Thu Jun 04 2026 - 23:02:20 EST


Hi, Jonas,

On Fri, Jun 5, 2026 at 2:25 AM Jonas Jelonek <jelonek.jonas@xxxxxxxxx> wrote:
>
> smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
> marks the CPU offline for the scheduler via set_cpu_online(false) but
> never informs RCU, so RCU keeps expecting a quiescent state from CPUs
> that are now spinning forever with interrupts disabled.
>
> As long as nothing waits for an RCU grace period after smp_send_stop()
> this is harmless, which is why it went unnoticed. Since commit
> 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
> however, irq_work_sync() calls synchronize_rcu() on architectures without
> an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
> false. That is the asm-generic default used by MIPS. Any irq_work_sync()
> issued in the reboot/shutdown path after smp_send_stop() then blocks on
> a grace period that can never complete, hanging the reboot:
>
> WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
> ...
> rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> rcu: Offline CPU 1 blocking current GP.
> rcu: Offline CPU 2 blocking current GP.
> rcu: Offline CPU 3 blocking current GP.
>
> This issue popped up during kernel bump downstream in OpenWrt from
> 6.18.33 to 6.18.34, since the suspected change has been backported to
> 6.18 stable branch [1].
Now 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single()
on PREEMPT_RT") has been backported to as early as 6.1 LTS.

>
> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
> waiting on the parked CPUs and grace periods can still complete.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.18.y&id=18c0456ea2615b1a743a6db739c74411c3b42bc6
>
> Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
> CC: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Jonas Jelonek <jelonek.jonas@xxxxxxxxx>
>
> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
> index 4868e79f3b30..0f28b4a62e72 100644
> --- a/arch/mips/kernel/smp.c
> +++ b/arch/mips/kernel/smp.c
> @@ -20,6 +20,7 @@
> #include <linux/sched/mm.h>
> #include <linux/cpumask.h>
> #include <linux/cpu.h>
> +#include <linux/rcupdate.h>
> #include <linux/err.h>
> #include <linux/ftrace.h>
> #include <linux/irqdomain.h>
> @@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
> set_cpu_online(smp_processor_id(), false);
> calculate_cpu_foreign_map();
> local_irq_disable();
> + rcutree_report_cpu_dead();
I'm not sure but maybe it is better to before local_irq_disable()?

Huacai
> while (1);
> }
>
> --
> 2.51.0
>
>