[PATCH] ceph: bound num_split_inos and num_split_realms in ceph_handle_snap()

From: Michael Bommarito

Date: Tue May 19 2026 - 07:54:31 EST


A peer that can deliver a CEPH_MSG_CLIENT_SNAP to the kernel CephFS
client (a compromised or malicious MDS, or an attacker who has
forged/replayed a cephx session on the cluster network) can cause an
out-of-bounds slab read in ceph_update_snap_trace() by sending
num_split_inos or num_split_realms as a small negative __le32.
ceph_handle_snap() parses both counts into signed int and then
advances the decode pointer with `p += sizeof(u64) * num_split_inos`;
the multiplication is in size_t, so the signed operand is widened
modulo 2**64 and a wire value like -32 produces an attacker-chosen
byte offset that walks p backwards into the slab. The subsequent
ceph_decode_need(&p, e, sizeof(*ri), bad) passes (end - p is huge),
ri = p, and the next 4-byte read inside ceph_update_snap_trace() is
performed from attacker-positioned memory. The same arithmetic and
the same pointer hand-off exist in the non-split branch.

Promote num_split_inos and num_split_realms to u32 to match the
on-wire __le32 fields, compute each array's byte length with
array_size() so a size_t overflow saturates to SIZE_MAX instead of
wrapping, sum the two lengths with check_add_overflow(), and verify
the total against the remaining front-buffer length before any
pointer bump. Re-use the validated byte counts for the bumps in both
the split and non-split branches.

The MDS is an authenticated peer under cephx, but the kernel client
is still expected to validate metadata it accepts over the wire;
this hardens the input-validation boundary that snap-message decode
crosses.

Fixes: 963b61eb041e8 ("ceph: snapshot management")
Cc: stable@xxxxxxxxxxxxxxx
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@xxxxxxxxx>
---
Reproduced on x86_64 QEMU/KVM, KASAN_INLINE generic, two ways:

- In-tree harness that allocates an upstream struct ceph_msg via
ceph_msg_new(), writes num_split_inos = (u32)-32,
num_split_realms = 0, op = CEPH_SNAP_OP_UPDATE into the front
buffer, and calls ceph_handle_snap(&mdsc, &session, msg)
directly.

- End-to-end over a real TCP connection from a real ceph-mds
daemon to the kernel CephFS client, with num_split_inos
rewritten to (u32)-32 in the front buffer and the messenger
v1 footer.front_crc recomputed so the kernel libceph receive
path accepts the message. The KASAN report fires from the
tcp_recvmsg softirq path through ceph_handle_snap+0x345 into
ceph_update_snap_trace+0x23bf, confirming the bug is reached
via the normal MDS->client receive path and not only by
direct harness invocation.

A stock v7.1-rc3 kernel produces:

BUG: KASAN: slab-out-of-bounds in ceph_update_snap_trace+0x23bf/0x31a0
Read of size 4 at addr ffff8880012be1f8 by task init/1
ceph_update_snap_trace+0x23bf/0x31a0
? ceph_handle_snap+0x312/0x900
ceph_handle_snap+0x345/0x900
The buggy address is located 248 bytes to the right of
allocated 256-byte region [ffff8880012be000, ffff8880012be100)

With this patch applied, the same trigger (both via the harness
and via the wire path) hits the new validator's goto bad path,
logs "corrupt snap message from mds0", calls ceph_msg_dump(), and
returns cleanly with no KASAN report. Harness and wire-injection
scripts available on request.

The kernel ships no fs/ceph selftests and no ceph KUnit module that
exercises ceph_handle_snap, so no in-tree selftest delta to report.

fs/ceph/snap.c | 31 +++++++++++++++++++++++++------
1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 52b4c2684f922..7c4487eb2708a 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -1027,9 +1027,10 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc,
void *p = msg->front.iov_base;
void *e = p + msg->front.iov_len;
struct ceph_mds_snap_head *h;
- int num_split_inos, num_split_realms;
+ u32 num_split_inos, num_split_realms;
__le64 *split_inos = NULL, *split_realms = NULL;
- int i;
+ size_t split_inos_bytes, split_realms_bytes, split_bytes;
+ u32 i;
int locked_rwsem = 0;
bool close_sessions = false;

@@ -1048,6 +1049,24 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc,
trace_len = le32_to_cpu(h->trace_len);
p += sizeof(*h);

+ /*
+ * Validate that the two MDS-supplied counts cannot wrap when
+ * multiplied by sizeof(u64), and that the two arrays together
+ * fit in the remaining front buffer before any of the pointer
+ * bumps below. Without this, a malformed (or malicious) snap
+ * message can cause 'p += sizeof(u64) * num_split_inos' to land
+ * at an attacker-chosen offset via the size_t * int widening,
+ * bypassing ceph_decode_need() and making the subsequent
+ * 'ri = p; ri->created' read out of bounds.
+ */
+ split_inos_bytes = array_size(num_split_inos, sizeof(u64));
+ split_realms_bytes = array_size(num_split_realms, sizeof(u64));
+ if (split_inos_bytes == SIZE_MAX || split_realms_bytes == SIZE_MAX ||
+ check_add_overflow(split_inos_bytes, split_realms_bytes,
+ &split_bytes) ||
+ (size_t)(e - p) < split_bytes)
+ goto bad;
+
doutc(cl, "from mds%d op %s split %llx tracelen %d\n", mds,
ceph_snap_op_name(op), split, trace_len);

@@ -1064,9 +1083,9 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc,
* child.
*/
split_inos = p;
- p += sizeof(u64) * num_split_inos;
+ p += split_inos_bytes;
split_realms = p;
- p += sizeof(u64) * num_split_realms;
+ p += split_realms_bytes;
ceph_decode_need(&p, e, sizeof(*ri), bad);
/* we will peek at realm info here, but will _not_
* advance p, as the realm update will occur below in
@@ -1144,8 +1163,8 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc,
* positioned at the start of realm info, as expected by
* ceph_update_snap_trace().
*/
- p += sizeof(u64) * num_split_inos;
- p += sizeof(u64) * num_split_realms;
+ p += split_inos_bytes;
+ p += split_realms_bytes;
}

/*
--
2.53.0