Re: [PATCH] ocfs2: fix UBSAN array-index-out-of-bounds in ocfs2_sum_rightmost_rec

From: Joseph Qi

Date: Mon Jun 01 2026 - 21:47:30 EST




On 6/1/26 11:30 PM, Ian Bridges wrote:
> [BUG]
> On-disk corruption setting l_next_free_rec to 0 in an inode's inline
> extent list triggers a UBSAN panic on the next write to that file.
>
> [CAUSE]
> ocfs2_sum_rightmost_rec() computes
> i = le16_to_cpu(el->l_next_free_rec) - 1
> and accesses el->l_recs[i] without validating i. When l_next_free_rec
> is 0, i becomes -1; when l_next_free_rec exceeds l_count, i falls
> past the end of the array. Either case violates the
> __counted_by_le(l_count) annotation on l_recs[] and triggers UBSAN.
>
> [FIX]
> Read l_count once into a local variable to eliminate a TOCTOU between
> the bounds check and the __counted_by_le-generated check at the array
> access. Validate i against both bounds before dereferencing l_recs[].
> On an out-of-bounds index call ocfs2_error() since both cases indicate
> filesystem corruption.
>
> Update the signature to accept a struct ocfs2_caching_info *ci and
> return int, with the cluster sum returned through a u32 out-parameter.
> Update both callers to pass et->et_ci and propagate the error.
>
> Fixes: 2f26f58df041 ("ocfs2: annotate flexible array members with __counted_by_le()")
> Reported-by: syzbot+be16e33db01e6644db7a@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://syzkaller.appspot.com/bug?extid=be16e33db01e6644db7a
> Signed-off-by: Ian Bridges <icb@xxxxxxxxxxxx>
> ---
> This patch contains a proposed fix for a crash reported by syzbot
> in ocfs2_grow_tree().
>
> The file names and offsets in this description are from commit
> 7cb1c5b32a2bfde961fff8d5204526b609bcb30a from this repo:
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
>
> I also have a small test harness that reproduces the original panic,
> which I can make available as well.
>
> The Bug
>
> The ocfs2_extent_list structure (fs/ocfs2/ocfs2_fs.h:458) contains
> two fields relevant to this bug: l_count, the total number of extent
> record slots in the list, and l_next_free_rec, the index of the next
> unused slot. The extent record array l_recs is annotated
> __counted_by_le(l_count) (fs/ocfs2/ocfs2_fs.h:472), which instructs
> UBSAN to bounds-check every access to l_recs against l_count.
>
> ocfs2_sum_rightmost_rec() (fs/ocfs2/alloc.c:1106) is a helper used
> by both ocfs2_add_branch() and ocfs2_shift_tree_depth() to compute
> the logical cluster position just past the rightmost occupied extent
> record. It derives the index of that record as:
>
> i = le16_to_cpu(el->l_next_free_rec) - 1; /* alloc.c:1110 */
>
> and then accesses el->l_recs[i] (fs/ocfs2/alloc.c:1112) without
> first checking that i is a valid index. This is the root cause of
> the bug.
>
> The syzbot report shows index -1 at the crash site, which means
> l_next_free_rec was 0 at the point of the crash. The crash occurs
> inside ocfs2_shift_tree_depth() (fs/ocfs2/alloc.c:1375), which is
> reached when ocfs2_find_branch_target() returns 1. That return value
> is only produced when l_next_free_rec equals l_count
> (fs/ocfs2/alloc.c:1530). Since l_next_free_rec is 0, l_count must
> also be 0 for this condition to hold.
>
> The normal inode read path, ocfs2_validate_inode_block()
> (fs/ocfs2/inode.c:1423), does not validate the inline extent list
> fields l_count or l_next_free_rec. A filesystem image with these
> fields set to 0 in an inode's inline extent list therefore passes
> read-time validation without error.
>

So why not move the check into ocfs2_validate_inode_block()?
Seems that is the right place to do this kind of work.

Thanks,
Joseph

> The syzbot console log shows a separate validation error at mount
> time — "Invalid dinode #17058: Corrupt state (nlink = 0 or mode =
> 0)" — indicating that the mounted filesystem contained at least one
> other corrupted inode. No reproducer was available in the report, so
> the exact mechanism by which the corruption was introduced is not
> known.
>
> Here is a breakdown of how the crash is triggered:
>
> 1. A write syscall eventually calls into ocfs2_insert_extent() to
> record the newly allocated cluster in the inode's extent tree.
>
> 2. ocfs2_insert_extent() determines that the extent tree has no room
> for a new record and calls ocfs2_grow_tree() (fs/ocfs2/alloc.c:1550).
>
> 3. ocfs2_grow_tree() calls ocfs2_find_branch_target()
> (fs/ocfs2/alloc.c:1561) to walk the tree and find a node with a
> free slot. Because l_tree_depth is 0, the traversal loop
> (fs/ocfs2/alloc.c:1491) does not execute. At
> fs/ocfs2/alloc.c:1530, the condition
>
> el->l_next_free_rec == el->l_count /* 0 == 0 */
>
> evaluates to true, so the function returns 1, indicating the tree
> must grow by shifting its depth.
>
> 4. Back in ocfs2_grow_tree(), shift is 1, so ocfs2_shift_tree_depth()
> is called (fs/ocfs2/alloc.c:1581).
>
> 5. ocfs2_shift_tree_depth() (fs/ocfs2/alloc.c:1375) allocates a new
> extent block and copies the root extent list into it. At
> fs/ocfs2/alloc.c:1420, it sets:
>
> eb_el->l_next_free_rec = root_el->l_next_free_rec; /* = 0 */
>
> The copy loop at fs/ocfs2/alloc.c:1421 runs zero iterations since
> l_next_free_rec is 0, so eb_el is left with an empty extent list.
>
> 6. ocfs2_shift_tree_depth() then calls
> ocfs2_sum_rightmost_rec(eb_el) (fs/ocfs2/alloc.c:1433) to determine
> the cluster count for the new root record.
>
> 7. Inside ocfs2_sum_rightmost_rec(), i is computed as:
>
> i = le16_to_cpu(eb_el->l_next_free_rec) - 1 = 0 - 1 = -1
>
> The subsequent access to el->l_recs[-1] (fs/ocfs2/alloc.c:1112)
> violates the __counted_by_le(l_count) annotation on l_recs[]
> (fs/ocfs2/ocfs2_fs.h:472). __counted_by_le is a macro defined in
> include/linux/compiler_types.h that expands to __counted_by, which
> is the GCC/Clang attribute UBSAN uses for array bounds checking. The
> UBSAN error message reports __counted_by rather than __counted_by_le
> because the check is performed against the expanded attribute. This
> triggers a UBSAN array-index-out-of-bounds panic.
>
> The Proposed Fix
>
> The fix modifies ocfs2_sum_rightmost_rec() with three changes.
>
> First, l_count is read once into a local variable before the bounds
> check. The __counted_by_le(l_count) annotation causes the compiler
> to emit a separate read of el->l_count at the array access site for
> the UBSAN check. Without the local variable, there are two
> independent reads of l_count — one for the guard and one at the
> array access site, where __counted_by_le causes the compiler to
> re-read it for the UBSAN check — creating a TOCTOU between them.
> Reading l_count once before the guard reduces this window to the
> minimum.
>
> Second, the index is checked against both bounds before dereferencing
> l_recs[]. Both checks are sound: i < 0 compares against the constant
> 0, requiring no trust in any on-disk value; i >= count checks the
> invariant l_next_free_rec <= l_count, which holds on any valid
> filesystem independent of the actual field values. Neither check can
> fire on a valid extent list. Corruption that keeps l_next_free_rec
> within [1, l_count] will pass the check and produce an incorrect
> result, but this is pre-existing behaviour — not a regression — and
> cannot be avoided without an external ground truth for l_count, which
> I do not believe exists for the inode's inline extent list.
>
> Third, on an out-of-bounds index, ocfs2_error() is called rather than
> silently returning a usable value.
>
> To accommodate these changes the function signature is updated: a
> struct ocfs2_caching_info *ci parameter is added for superblock
> access, the return type changes from u32 to int, the cluster sum is
> returned via a new u32 out-parameter, and the inline qualifier is
> removed since the function is no longer a trivial helper. Both
> callers, ocfs2_add_branch() and ocfs2_shift_tree_depth(), already
> have status variables and bail labels, and are updated to pass
> et->et_ci, check the return value, and propagate any error up their
> respective call chains.