Re: [EXTERNAL] Re: [PATCH v2] hfs: prevent MDB and bitmap buffer_head aliasing

From: slava

Date: Mon Jun 01 2026 - 14:48:16 EST

On Mon, 2026-06-01 at 22:01 +0800, Sam Sun wrote:
> On Sat, May 30, 2026 at 7:23 AM Viacheslav Dubeyko
> <vdubeyko@xxxxxxxxxx> wrote:
> >
> > On Fri, 2026-05-29 at 14:29 -0700, Viacheslav Dubeyko wrote:
> > > On Fri, 2026-05-29 at 18:34 +0800, Yue Sun wrote:
> > > > hfs_mdb_commit() writes the volume bitmap while HFS_SB(sb)-
> > > > >mdb_bh is
> > > > locked. A crafted image can set drVBMSt so that the bitmap
> > > > block resolves
> > > > to the same buffer_head as the MDB. When writeback later calls
> > > > lock_buffer() for that bitmap block, the task tries to lock
> > > > mdb_bh again
> > > > and self-deadlocks in __lock_buffer().
> > > >
> > > > Reject images whose volume bitmap starts at or before the MDB
> > > > during
> > > > mount. Also guard the bitmap writeback path itself: if the
> > > > bitmap block
> > > > would resolve to mdb_bh, force the filesystem read-only and
> > > > stop bitmap
> > > > writeback before taking the buffer lock. This keeps the
> > > > deadlock fix in
> > > > the MDB commit path and reuses the existing bitmap
> > > > size/writeback logic.
> > > >
> > > > Reported-by: Yue Sun <samsun1006219@xxxxxxxxx>
> > > > Closes:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_CAEkJfYMB47v1yOWHB8q2dc8kf-3Duj-2DrLO-3D-2ByMyudwPguJ8Kd3jA-40mail.gmail.com_&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=2ATrdgpvRdYnJfz2T53Ih3iQv6Y9wogU2Ba19prTYgchh1mh4KI7lo9TYLQnlu3X&s=S8XdWVW0SpxB05CfctrqCJtJEXqtWDfLNdtzPHIZAY8&e=
> > > > Signed-off-by: Yue Sun <samsun1006219@xxxxxxxxx>
> > > > ---
> > > > Changes in v2:
> > > > - Add a commit-time guard before locking bitmap buffer_heads.
> > > > - Replace the mount-time byte-range check with a simple drVBMSt
> > > > check.
> > > > - Reuse the existing bitmap writeback size calculation.
> > > >
> > > > fs/hfs/mdb.c | 10 ++++++++++
> > > > 1 file changed, 10 insertions(+)
> > > >
> > > > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > > > index a97cea35ca2e..53cd137892b5 100644
> > > > --- a/fs/hfs/mdb.c
> > > > +++ b/fs/hfs/mdb.c
> > > > @@ -185,6 +185,11 @@ int hfs_mdb_get(struct super_block *sb)
> > > >             sb->s_flags |= SB_RDONLY;
> > > >     }
> > > >
> > > > +   if (be16_to_cpu(mdb->drVBMSt) <= HFS_MDB_BLK) {
> > >
> > > Technically speaking, if we are trying to check the overlapping
> > > of volume bitmap
> > > with the main MDB record, then we need to check the overlapping
> > > with alternative
> > > MDB record, and with Catalog File and Extents Overflow File.
> > > However, it sounds
> > > like we are trying to add some FSCK logic here. :)
> > >
> > > > +           pr_err("volume bitmap overlaps MDB\n");
> > >
> > > This situation means volume corruption. It makes sense to
> > > recommend to run FSCK.
> > >
> > > > +           return -EIO;
> > >
> > > This code error is wrong because the read operation was OK. But
> > > we have
> > > corrupted volume. Even if we have overlapping of volume bitmap
> > > with MDB record,
> > > then we cannot reject mount operation. We must mount in READ-ONLY
> > > mode because,
> > > potentially, the rest of metadata could be completely OK. We
> > > simply cannot mount
> > > in RW mode.
> > >
> > > > +   }
> > > > +
> > > >     /* TRY to get the alternate (backup) MDB. */
> > > >     sect = part_start + part_size - 2;
> > > >     bh = sb_bread512(sb, sect, mdb2);
> > > > @@ -341,6 +346,11 @@ void hfs_mdb_commit(struct super_block
> > > > *sb)
> > > >             size = (HFS_SB(sb)->fs_ablocks + 7) / 8;
> > > >             ptr = (u8 *)HFS_SB(sb)->bitmap;
> > > >             while (size) {
> > > > +                   if (unlikely(block == HFS_SB(sb)->mdb_bh-
> > > > >b_blocknr)) {
> > > > +                           pr_err("volume bitmap overlaps MDB,
> > > > forcing read-only\n");
> > > > +                           sb->s_flags |= SB_RDONLY;
> > > > +                           break;
> > > > +                   }
> > >
> > > At this point, we already wrote main MDB and alternative MDB to
> > > the volume.
> > > Theoretically, it is possible to imagine that if size of volume
> > > bitmap is big
> > > enough, then we could partially process the bitmap too. Probably,
> > > we need to
> > > check the overlapping at the beginning of the method and reject
> > > the whole
> > > superblock commit.
> > >
> > > Initial issue was the deadlock. Could we implement some check
> > > that buffer_heads
> > > don't overlap before trying to lock? Does it make sense to you?
> > >
> >
> > By the way, I can see the deadlock in hfs_mdb_commit() even for
> > completely valid
> > HFS volume during generic/013 test run. This issue takes place
> > because folio
> > lock related issue. I remember that somebody has sent the patch
> > related to this
> > issue but we haven't finished the discussion with some reasonable
> > solution. I
> > think we need to rework the locking scheme:
> >
> > diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
> > index 97e8d1f96d6d..919798eda0f8 100644
> > --- a/fs/hfs/hfs_fs.h
> > +++ b/fs/hfs/hfs_fs.h
> > @@ -64,6 +64,7 @@ struct hfs_inode_info {
> > * The HFS-specific part of a Linux (struct super_block)
> > */
> > struct hfs_sb_info {
> > +       struct mutex mdb_lock;                  /* MDB operations
> > lock */
> >         struct buffer_head *mdb_bh;             /* The hfs_buffer
> >                                                    holding the real
> >                                                    superblock (aka
> > VIB
> > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > index a97cea35ca2e..94497c155d29 100644
> > --- a/fs/hfs/mdb.c
> > +++ b/fs/hfs/mdb.c
> > @@ -291,9 +291,9 @@ void hfs_mdb_commit(struct super_block *sb)
> >         if (sb_rdonly(sb))
> >                 return;
> >
> > -       lock_buffer(HFS_SB(sb)->mdb_bh);
> >         if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, &HFS_SB(sb)-
> > >flags)) {
> >                 /* These parameters may have been modified, so
> > write them back
> > */
> > +               lock_buffer(HFS_SB(sb)->mdb_bh);
> >                 mdb->drLsMod = hfs_mtime();
> >                 mdb->drFreeBks = cpu_to_be16(HFS_SB(sb)-
> > >free_ablocks);
> >                 mdb->drNxtCNID =
> > @@ -304,6 +304,7 @@ void hfs_mdb_commit(struct super_block *sb)
> >                         cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > > file_count));
> >                 mdb->drDirCnt =
> >                         cpu_to_be32((u32)atomic64_read(&HFS_SB(sb)-
> > > folder_count));
> > +               unlock_buffer(HFS_SB(sb)->mdb_bh);
> >
> >                 /* write MDB to disk */
> >                 mark_buffer_dirty(HFS_SB(sb)->mdb_bh);
> > @@ -360,7 +361,6 @@ void hfs_mdb_commit(struct super_block *sb)
> >                         size -= len;
> >                 }
> >         }
> > -       unlock_buffer(HFS_SB(sb)->mdb_bh);
> > }
> >
> > void hfs_mdb_close(struct super_block *sb)
> > diff --git a/fs/hfs/super.c b/fs/hfs/super.c
> > index a466c401f6bb..5d4caf3ddda6 100644
> > --- a/fs/hfs/super.c
> > +++ b/fs/hfs/super.c
> > @@ -35,7 +35,11 @@ MODULE_LICENSE("GPL");
> > static int hfs_sync_fs(struct super_block *sb, int wait)
> > {
> >         is_hfs_cnid_counts_valid(sb);
> > +
> > +       mutex_lock(&HFS_SB(sb)->mdb_lock);
> >         hfs_mdb_commit(sb);
> > +       mutex_unlock(&HFS_SB(sb)->mdb_lock);
> > +
> >         return 0;
> > }
> >
> > @@ -68,7 +72,9 @@ static void flush_mdb(struct work_struct *work)
> >
> >         is_hfs_cnid_counts_valid(sb);
> >
> > +       mutex_lock(&sbi->mdb_lock);
> >         hfs_mdb_commit(sb);
> > +       mutex_unlock(&sbi->mdb_lock);
> > }
> >
> > void hfs_mark_mdb_dirty(struct super_block *sb)
> > @@ -339,9 +345,13 @@ static int hfs_fill_super(struct super_block
> > *sb, struct
> > fs_context *fc)
> >         sb->s_op = &hfs_super_operations;
> >         sb->s_xattr = hfs_xattr_handlers;
> >         sb->s_flags |= SB_NOATIME | SB_NODIRATIME;
> > +       mutex_init(&sbi->mdb_lock);
> >         mutex_init(&sbi->bitmap_lock);
> >
> > +       mutex_lock(&sbi->mdb_lock);
> >         res = hfs_mdb_get(sb);
> > +       mutex_unlock(&sbi->mdb_lock);
> > +
> >         if (res) {
> >                 if (!silent)
> >                         pr_warn("can't find a HFS filesystem on dev
> > %s\n",
> >
> > What do you think?
> >
> > Thanks,
> > Slava.
> >
>
> Thanks for the detailed comments. I agree that my patch was too
> narrow. It only handled the corrupted-layout case without fixing the
> more general locking problem in hfs_mdb_commit(), including the case
> you mentioned where a valid HFS volume can still deadlock during
> xfstests.
>
> I also agree with your main direction that the MDB buffer lock should
> not be used as a transaction lock for the whole MDB commit. However,
> one detail in the locking sketch worries me, though. After narrowing
> the mdb_bh lock, the HFS_FLG_ALT_MDB_DIRTY path still modifies the
> primary MDB buffer through hfs_inode_write_fork():
> hfs_inode_write_fork(..., mdb->drXTExtRec, &mdb->drXTFlSize, NULL);
> hfs_inode_write_fork(..., mdb->drCTExtRec, &mdb->drCTFlSize, NULL);
>
> hfs_inode_write_fork() writes to the extent record and size pointers
> that
> are passed to it, so these are still direct writes into mdb_bh-
> >b_data.
> The new HFS-level mdb_lock serializes HFS MDB commits, but generic
> buffer writeback does not take that filesystem-private mutex. It only
> synchronizes with the buffer_head lock. So I think these primary MDB
> updates still need to be covered by lock_buffer(mdb_bh), even if
> HFS_FLG_MDB_DIRTY itself is not set.

I don't quite follow what you mean here. If you take a look into the
hfs_inode_write_fork():

void hfs_inode_write_fork(struct inode *inode, struct hfs_extent *ext,
__be32 *log_size, __be32 *phys_size)
{
memcpy(ext, HFS_I(inode)->first_extents,
sizeof(hfs_extent_rec));

if (log_size)
*log_size = cpu_to_be32(inode->i_size);
if (phys_size)
*phys_size = cpu_to_be32(HFS_I(inode)->alloc_blocks *
HFS_SB(inode->i_sb)-
>alloc_blksz);
}

You can see that it is simply extracting values into buffers and
nothing more. No mdb_bh->b_data operations are involved.

>
> The approach I am considering is to split the fix into two parts:
>
> 1. Rework the locking in hfs_mdb_commit().
>    Add an HFS-level mdb_lock and take it inside hfs_mdb_commit(), so
> all
>    MDB commits are serialized at the HFS layer. Then narrow the
> mdb_bh
>    buffer lock so it only covers direct updates of the primary MDB
>    buffer, including the hfs_inode_write_fork() calls from the
>    HFS_FLG_ALT_MDB_DIRTY path. mark_buffer_dirty(mdb_bh) is done in
> that
>    short critical section. The alternate MDB sync and bitmap
> writeback
>    happen after mdb_bh has been unlocked.
>
> 2. Handle the corrupted volume-bitmap layout separately.
>    Instead of trying to do broader fsck-style validation, only check
> the
>    condition relevant to this report: whether the volume bitmap's HFS
>    512-byte sector range overlaps the main MDB sector. If this is
> found
>    during mount, warn and keep the filesystem read-only, rather than
>    returning -EIO. In hfs_mdb_commit(), check the same condition
> before
>    clearing dirty bits or doing any writeback, force read-only, and
>    return. I also keep a flag for this condition so a later remount-
> rw
>    attempt leaves the filesystem read-only as well.
>
> Draft patches are attached to this email. What do you think?
>
>

I don't know how to review attachments. If you would like to share some
code for review, then, please, include the diff into the body of email.

Thanks,
Slava.