Re: [syzbot] [mm?] kernel BUG in collapse_scan_file
From: Lorenzo Stoakes (Oracle)
Date: Thu Mar 19 2026 - 07:05:22 EST
On Thu, Mar 19, 2026 at 11:56:21AM +0100, Vlastimil Babka wrote:
> On 3/19/26 09:05, Lance Yang wrote:
> > Ccing Willy
> >
> > IIUC, this is a dup of the earlier report[1], which I looked into back
> > in January. The root cause is the same: collapse_file() calls
> > xas_lock_irq() without resetting the xas state first, tripping the
> > XAS_INVALID() assertion:
> >
> > #define xas_lock_irq(xas) xa_lock_irq(XAS_INVALID(xas)->xa)
> >
> > static inline struct xa_state *XAS_INVALID(struct xa_state *xas)
> > {
> > XA_NODE_BUG_ON(xas->xa_node, xas_valid(xas));
> > return xas;
> > }
> >
> > Added by commit:
> >
> > commit 43b00759f21b10142094d1ae5ff65cbb368953a3
> > Author: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> > Date: Sun Dec 14 10:53:31 2025 -0500
> >
> > XArray: Add extra debugging check to xas_lock and friends
> >
> > While tracking down a recent bug, we discovered somewhere that had
> > forgotten to call xas_reset() before calling xas_lock(). Add a debug
> > check to be sure that doesn't happen in future and fix all the
> > places in
> > the test suite which were carelessly doing just this.
> >
> > Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> >
> > I posted a HACK fix at the time[2], but David pointed out that Willy
> > had mentioned it likely needs more thought[3].
> >
> > [1]
> > https://lore.kernel.org/all/69757ea0.a00a0220.33ccc7.0017.GAE@xxxxxxxxxx/
> > [2] https://lore.kernel.org/all/20260125121001.32733-1-lance.yang@xxxxxxxxx/
> > [3]
> > https://lore.kernel.org/all/7bce9231-714c-424a-a4e3-dd42734fb767@xxxxxxxxxx/
>
> That "needs more thought" was Jan 5. After 2.5 months later this is still
> messing up linux-next testing due to a known unfixed problem. Completely
> unnaceptable. Willy, you need to drop the new bug check until the known
> problem is fixed.
>
> Mark, please drop https://git.infradead.org/users/willy/xarray.git from
> linux-next until it stops breaking linux-next. Thanks.
Thanks, also I don't see a Link: tag or any discussion of this patch anywhere
on-list (maybe I missed it?) the only think a search on lore brings up is a bug
report from jan 5th [0] about it.
If this is heading for a Linus PR, could we have the patch actually posted to
lore somewhere so there can be some discussion?
And is there a way to ensure this doesn't land in the next merge window unless
it's fixed? Not sure through which tree it's going (Willy's?).
In general I'm very uncomfortable 'just leaving' splatting kernels in the
-next tree.
[0]:https://lore.kernel.org/all/aVvz3tYdu49TGkjI@xxxxxxxxxxxxx/
Thanks, Lorenzo