Re: [syzbot] [mm?] kernel BUG in collapse_scan_file
From: David Hildenbrand (Arm)
Date: Thu Mar 19 2026 - 05:00:52 EST
On 3/19/26 09:53, Lorenzo Stoakes (Oracle) wrote:
> On Thu, Mar 19, 2026 at 04:05:38PM +0800, Lance Yang wrote:
>> Ccing Willy
>>
>> IIUC, this is a dup of the earlier report[1], which I looked into back
>> in January. The root cause is the same: collapse_file() calls
>> xas_lock_irq() without resetting the xas state first, tripping the
>> XAS_INVALID() assertion:
>>
>> #define xas_lock_irq(xas) xa_lock_irq(XAS_INVALID(xas)->xa)
>>
>> static inline struct xa_state *XAS_INVALID(struct xa_state *xas)
>> {
>> XA_NODE_BUG_ON(xas->xa_node, xas_valid(xas));
>> return xas;
>> }
>>
>> Added by commit:
>>
>> commit 43b00759f21b10142094d1ae5ff65cbb368953a3
>> Author: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
>> Date: Sun Dec 14 10:53:31 2025 -0500
>>
>> XArray: Add extra debugging check to xas_lock and friends
>>
>> While tracking down a recent bug, we discovered somewhere that had
>> forgotten to call xas_reset() before calling xas_lock(). Add a debug
>> check to be sure that doesn't happen in future and fix all the places in
>> the test suite which were carelessly doing just this.
>>
>> Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
>>
>> I posted a HACK fix at the time[2], but David pointed out that Willy
>> had mentioned it likely needs more thought[3].
>
> Hmm we shouldn't leave this bug in place while working for a fancier fix??
>
> Can we get _something_ going as an upstream fix? We can improve whatever we do
> later right?
>
> David, thoughts?
I recall Willy mentioning that the issue is likely a false positive.
IIUC, that commit is not upstream? So it only triggers in linux-next.
Which means:
1) If it's a false positive, upstream is not effected (no XA_NODE_BUG_ON)
2) If it's not a false positive, upstream is effected but does not
trigger the XA_NODE_BUG_ON
--
Cheers,
David