On 6/4/2025 8:26 AM, Paul E. McKenney wrote:
Thanks. I am still working on polishing the fix Xiongfeng tested. I hope to haveLooking at both the v6.6 version and Joel's fix, I am forced to concludeSorry, I communicate with the teams who maintain the syzkaller tools. He saidOh sorry I meant to ask - could you provide the full kernel log and also isSure, I understand. This is already incredibly valuable so thank you again.Or just don't send subsequent self-IPIs if we just sent one for theThanks a lot for your reply. I think it's hard for me to fix this issue as
rdp. Chances are, if we did not get the scheduler's attention during
the first one, we may not in subsequent ones I think. Plus we do send
other IPIs already if the grace period was over extended (from the FQS
loop), maybe we can tweak that?
above without introducing new bugs. I barely understand the RCU code. But I'm
very glad to help test if you have any code modifiction need to. I have
the VM and the syskaller benchmark which can reproduce the problem.
Will request for your testing help soon. I also have a test module now which
can sort-off reproduce this. Keep you posted!
there a standalone reproducer syzcaller binary one can run to reproduce it in a VM?
I can't send the syskaller binary out of the company. Sorry, but I can help to
reproduce. It's not complicate and not time consuming.
I found the origin log which use kernel v6.6. But it's not complete.
Then I reprouce the problem using the latest kernel.
Both logs are attached as attachments.
that this bug has been there for a very long time. Thank you for your
testing efforts and Joel for the fix!
it out next week for review. As we discussed I will split the context-tracking
API into a separate patch and will also add a separate documentation
comment-patch on why we need the irq_work.
thanks,
- Joel