Re: [syzbot] [mm?] kernel BUG in collapse_scan_file
From: Lance Yang
Date: Thu Mar 19 2026 - 05:16:32 EST
On 2026/3/19 17:00, David Hildenbrand (Arm) wrote:
On 3/19/26 09:53, Lorenzo Stoakes (Oracle) wrote:
On Thu, Mar 19, 2026 at 04:05:38PM +0800, Lance Yang wrote:
Ccing Willy
IIUC, this is a dup of the earlier report[1], which I looked into back
in January. The root cause is the same: collapse_file() calls
xas_lock_irq() without resetting the xas state first, tripping the
XAS_INVALID() assertion:
#define xas_lock_irq(xas) xa_lock_irq(XAS_INVALID(xas)->xa)
static inline struct xa_state *XAS_INVALID(struct xa_state *xas)
{
XA_NODE_BUG_ON(xas->xa_node, xas_valid(xas));
return xas;
}
Added by commit:
commit 43b00759f21b10142094d1ae5ff65cbb368953a3
Author: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
Date: Sun Dec 14 10:53:31 2025 -0500
XArray: Add extra debugging check to xas_lock and friends
While tracking down a recent bug, we discovered somewhere that had
forgotten to call xas_reset() before calling xas_lock(). Add a debug
check to be sure that doesn't happen in future and fix all the places in
the test suite which were carelessly doing just this.
Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
I posted a HACK fix at the time[2], but David pointed out that Willy
had mentioned it likely needs more thought[3].
Hmm we shouldn't leave this bug in place while working for a fancier fix??
Can we get _something_ going as an upstream fix? We can improve whatever we do
later right?
David, thoughts?
I recall Willy mentioning that the issue is likely a false positive.
IIUC, that commit is not upstream? So it only triggers in linux-next.
Right. That does not appear to be in upstream, I only see it in linux-next :)
Which means:
1) If it's a false positive, upstream is not effected (no XA_NODE_BUG_ON)
2) If it's not a false positive, upstream is effected but does not
trigger the XA_NODE_BUG_ON
Yep. So this particular BUG_ON is not affecting upstream directly.
That said, syzbot will likely keep hitting it in linux-next and
generating noise for us until it is addressed there ...