Re: [PATCH] ocfs2: revalidate the journal dinode before toggling dirty
From: ZhengYuan Huang
Date: Sun May 10 2026 - 22:59:23 EST
On Sun, May 10, 2026 at 12:02 PM Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx> wrote:
>
>
>
> On 5/9/26 9:52 PM, ZhengYuan Huang wrote:
> > [BUG]
> > A fuzzed OCFS2 image can corrupt the current slot journal dinode while
> > mount is still in progress. The mount path first reports the invalid
> > journal block and then crashes in shutdown:
> >
> > kernel BUG at fs/ocfs2/journal.c:1034!
> > Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
> > RIP: 0010:ocfs2_journal_toggle_dirty+0x2d6/0x340 fs/ocfs2/journal.c:1034
> > Call Trace:
> > ocfs2_journal_shutdown+0x414/0xc30 fs/ocfs2/journal.c:1116
> > ocfs2_mount_volume fs/ocfs2/super.c:1785 [inline]
> > ocfs2_fill_super+0x30a9/0x3cd0 fs/ocfs2/super.c:1083
> > get_tree_bdev_flags+0x38b/0x640 fs/super.c:1698
> > get_tree_bdev+0x24/0x40 fs/super.c:1721
> > ocfs2_get_tree+0x21/0x30 fs/ocfs2/super.c:1184
> > vfs_get_tree+0x9a/0x370 fs/super.c:1758
> > fc_mount fs/namespace.c:1199 [inline]
> > do_new_mount_fc fs/namespace.c:3642 [inline]
> > do_new_mount fs/namespace.c:3718 [inline]
> > path_mount+0x5b8/0x1ea0 fs/namespace.c:4028
> > do_mount fs/namespace.c:4041 [inline]
> > __do_sys_mount fs/namespace.c:4229 [inline]
> > __se_sys_mount fs/namespace.c:4206 [inline]
> > __x64_sys_mount+0x282/0x320 fs/namespace.c:4206
> > ...
> >
> >
> > [CAUSE]
> > ocfs2_journal_toggle_dirty() assumes journal->j_bh still contains the
> > same validated dinode that ocfs2_journal_init() locked earlier, and it
> > uses BUG_ON() when the buffer no longer looks like a dinode. That
> > assumption is too strong. The mount path can force the same current-slot
> > journal inode block back in from disk through
> > ocfs2_read_journal_inode(..., OCFS2_BH_IGNORE_CACHE) while
> > ocfs2_mark_dead_nodes() scans the journal slots. If that reread finds
> > corrupted metadata, mount unwinds through ocfs2_journal_shutdown(),
> > which reuses journal->j_bh and turns the metadata corruption into a
> > kernel BUG.
> >
>
> A bit confused.
> Since journal dinode is firstly validated, it means image is checked.
> Now mount is in progress, how to corrupt it during runtime?
>
> Thanks,
> Joseph
Thanks for taking a look.
Yes, the journal dinode is validated when it is first initialized. My
concern is that later in the mount path, the same journal inode block
may be read again from disk with OCFS2_BH_IGNORE_CACHE, so the buffer
used by ocfs2_journal_shutdown() may no longer be the same validated
contents.
This does not mean the filesystem itself corrupts the block during
mount. Rather, after the initial validation and before the later use,
the block contents may change due to unexpected disk corruption, I/O
problems, or a forced reread of corrupted on-disk metadata. In that
case, ocfs2_journal_toggle_dirty() should not rely only on the earlier
validation.
Since this is a cold mount/shutdown error path, adding this extra
validation should not have a meaningful performance impact. I see it
as a small robustness improvement to avoid turning bad metadata into a
kernel BUG.
Thanks,
ZhengYuan Huang