Re: [PATCH v2] bpf, sockmap: keep sk_msg copy state in sync
From: Han Guidong
Date: Tue May 19 2026 - 23:32:48 EST
On Wed, May 20, 2026 at 11:13 AM Jiayuan Chen <jiayuan.chen@xxxxxxxxx> wrote:
>
>
> On 5/17/26 8:16 PM, Zhang Cen wrote:
> > SK_MSG uses msg->sg.copy as per-scatterlist-entry provenance. Entries
> > with this bit set are copied before data/data_end are exposed to SK_MSG
> > BPF programs for direct packet access.
> >
> > bpf_msg_pull_data(), bpf_msg_push_data() and bpf_msg_pop_data() rewrite
> > the sk_msg scatterlist ring by collapsing, splitting and shifting
> > entries. These operations move msg->sg.data[] entries, but the parallel
> > copy bitmap can be left behind or stale in slots that no longer contain
> > the original entry. A copied entry can therefore later occupy a slot whose
> > copy bit is clear and be exposed as directly writable packet data.
> >
> > Keep msg->sg.copy synchronized with scatterlist entry moves, preserve the
> > copy bit when an entry is split, clear it when a helper replaces an entry
> > with a private page, and clear every slot vacated by pull-data
> > compaction.
> >
> > Fixes: 015632bb30da ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
> > Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
> > Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Co-developed-by: Han Guidong <2045gemini@xxxxxxxxx>
> > Signed-off-by: Han Guidong <2045gemini@xxxxxxxxx>
> > Signed-off-by: Zhang Cen <rollkingzzc@xxxxxxxxx>
> > ---
> > v2:
> > Sashiko-bot pointed out that bpf_msg_pull_data() could leave stale copy
> > bits on collapsed tail entries.
> >
> > Clear msg->sg.copy for every entry consumed by bpf_msg_pull_data()
> > before compacting the scatterlist ring.
> >
> > While researching recent page cache bugs, we discovered this bug.
> > We confirmed it allows overwriting the page cache of read-only files
> > via splice(). We haven't attempted to write an exploit, but the
> > corruption primitive is verified. PoC available upon request.
> > Recommend fixing ASAP.
>
> I think only "splice() + KTLS + sockmap" is vulnerable, right ?
>
> I digded a lot but didn't find any other combo.
>
> Actually the normal TCP/UDP with splice() will not go through sockmap
> (unsupported yet)
Hi Jiayuan,
Thanks for digging into this. Yes, our PoC exactly relies on the
splice() + KTLS + sockmap combo.
We haven't exhaustively audited all other potential paths, so we can't
say for sure if it's the absolutely only vulnerable combination, but
it is indeed the one we used and verified.
Thanks.