Re: [QUESTION] problems report: rcu_read_unlock_special() called in irq_exit() causes dead loop

From: Qi Xi
Date: Tue Jul 01 2025 - 05:21:06 EST

Next message: Geert Uytterhoeven: "Re: [PATCH v14 5/5] serial: sh-sci: Add support for RZ/T2H SCI"
Previous message: Huacai Chen: "Re: [PATCH v2] LoongArch: KVM: INTC: Add IOCSR MISC register emulation"
Next in thread: Joel Fernandes: "Re: [QUESTION] problems report: rcu_read_unlock_special() called in irq_exit() causes dead loop"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello everyone,

Friendly ping about this problem :)

Qi

On 2025/6/6 2:56, Joel Fernandes wrote:

On 6/4/2025 8:26 AM, Paul E. McKenney wrote:

Or just don't send subsequent self-IPIs if we just sent one for the
rdp. Chances are, if we did not get the scheduler's attention during
the first one, we may not in subsequent ones I think. Plus we do send
other IPIs already if the grace period was over extended (from the FQS
loop), maybe we can tweak that?

Thanks a lot for your reply. I think it's hard for me to fix this issue as
above without introducing new bugs. I barely understand the RCU code. But I'm
very glad to help test if you have any code modifiction need to. I have
the VM and the syskaller benchmark which can reproduce the problem.

Sure, I understand. This is already incredibly valuable so thank you again.
Will request for your testing help soon. I also have a test module now which
can sort-off reproduce this. Keep you posted!

Oh sorry I meant to ask - could you provide the full kernel log and also is
there a standalone reproducer syzcaller binary one can run to reproduce it in a VM?

Sorry, I communicate with the teams who maintain the syzkaller tools. He said
I can't send the syskaller binary out of the company. Sorry, but I can help to
reproduce. It's not complicate and not time consuming.

I found the origin log which use kernel v6.6. But it's not complete.
Then I reprouce the problem using the latest kernel.
Both logs are attached as attachments.

Looking at both the v6.6 version and Joel's fix, I am forced to conclude
that this bug has been there for a very long time. Thank you for your
testing efforts and Joel for the fix!

Thanks. I am still working on polishing the fix Xiongfeng tested. I hope to have
it out next week for review. As we discussed I will split the context-tracking
API into a separate patch and will also add a separate documentation
comment-patch on why we need the irq_work.

thanks,

- Joel

Next message: Geert Uytterhoeven: "Re: [PATCH v14 5/5] serial: sh-sci: Add support for RZ/T2H SCI"
Previous message: Huacai Chen: "Re: [PATCH v2] LoongArch: KVM: INTC: Add IOCSR MISC register emulation"
Next in thread: Joel Fernandes: "Re: [QUESTION] problems report: rcu_read_unlock_special() called in irq_exit() causes dead loop"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]