Re: [BUG io_uring] Failed RECVSEND_BUNDLE can persistently shrink non-INC pbuf ring len and affect later READ operations

From: Jens Axboe

Date: Sun Jun 07 2026 - 15:07:30 EST


On 6/7/26 5:41 AM, Federico Brasili wrote:
> Hi,
>
> I found a reproducible io_uring provided-buffer ring issue on Ubuntu
> kernel 7.0.0-22-generic.
>
> A failed IORING_RECVSEND_BUNDLE receive on a non-INC provided-buffer
> ring can persistently shrink the user-visible buffer descriptor
> length. The modified length is not rolled back when the receive fails
> with -EAGAIN/no data, and a later unrelated io_uring operation, such
> as IORING_OP_READ from a pipe, consumes the corrupted length.
>
> This is not a demonstrated privilege escalation. The demonstrated
> impact is deterministic unprivileged provided-buffer ring metadata
> corruption across unrelated io_uring operations.
>
> Tested kernel:
>
> Linux ubuntu 7.0.0-22-generic #22-Ubuntu SMP PREEMPT_DYNAMIC Mon May
> 25 15:54:34 UTC 2026 x86_64 GNU/Linux
>
> Summary:
>
> Create an io_uring instance as an unprivileged user.
>
> Register a non-INC provided-buffer ring with two buffers:
>
> entry0.len = 4096
>
> entry1.len = 4096
>
> Submit IORING_OP_RECV with:
>
> IOSQE_BUFFER_SELECT
>
> IORING_RECVSEND_BUNDLE
>
> req_len = 1
>
> MSG_DONTWAIT
>
> empty AF_UNIX SOCK_DGRAM socket
>
> The receive fails with -EAGAIN, but entry0.len is changed from 4096 to 1.
>
> Submit a later unrelated IORING_OP_READ from a pipe using the same
> provided-buffer group with req_len = 4096.
>
> The READ returns only 1 byte, because it uses the previously corrupted
> entry0.len.
>
> A second READ then consumes entry1 normally and returns 4096 bytes,
> showing that head/bid accounting remains coherent and the corruption
> is localized to the poisoned descriptor.
>
> Observed output from clean unprivileged reproduction:
>
> [INIT] uid=1002 entry0.len=4096 entry1.len=4096 tail=2
> [STEP1] RECV BUNDLE on empty socket, req_len=1, expected CQE=-EAGAIN
> [CQE_RECV_BUNDLE] res=-11 flags=0x0 user=0x1111
> [AFTER_RECV_BUNDLE] entry0.len=1 entry1.len=4096 changed_buf0=0
> changed_buf1=0 guard_before=0 guard_after=0
> [STEP2] write pipe bytes=4096, then IORING_OP_READ req_len=4096 using
> same pbuf group
> [CQE_READ1] res=1 flags=0x1 user=0x6666
> [AFTER_READ1] entry0.len=1 entry1.len=4096 changed_buf0=1
> changed_buf1=0 guard_before=0 guard_after=0
> [STEP3] write second pipe bytes=4096, then second IORING_OP_READ
> req_len=4096 without republish
> [CQE_READ2] res=4096 flags=0x10001 user=0x7777
> [AFTER_READ2] entry0.len=1 entry1.len=4096 changed_buf0=1
> changed_buf1=4096 guard_before=0 guard_after=0
> [RESULT] PASS: unprivileged RECV_BUNDLE -EAGAIN poisoned pbuf len and
> later IORING_OP_READ consumed the corrupted len.
>
> Why this looks like a bug:
>
> The failed receive should not persistently alter the provided-buffer
> descriptor in a way that affects future unrelated operations. In this
> case, a no-data/-EAGAIN RECV_BUNDLE changes entry0.len from 4096 to 1,
> and that corrupted length is later consumed by IORING_OP_READ from a
> pipe.
>
> The suspected root cause is in the non-INC provided-buffer ring BUNDLE
> selection path:
>
> io_ring_buffers_peek()
> if (len > arg->max_len) {
> len = arg->max_len;
> if (!(bl->flags & IOBL_INC)) {
> arg->partial_map = 1;
> if (iov != arg->iovs)
> break;
> WRITE_ONCE(buf->len, len);
> }
> }
>
> The descriptor length is modified during buffer selection/peek before
> the receive operation has completed successfully. If the receive later
> fails with -EAGAIN/no data, the buffer is recycled but the modified
> buf->len is not restored.
>
> Additional observations:
>
> The issue reproduces as an unprivileged user.
>
> The effect crosses io_uring operations: RECV affects a later READ.
>
> The effect crosses subsystems: socket receive affects pipe read.
>
> The second READ correctly uses entry1 and returns 4096 bytes, so this
> does not appear to be a head/bid desync in the tested case.
>
> No kernel crash, OOB write, UAF, or privilege escalation has been demonstrated.
>
> Expected behavior:
>
> If IORING_RECVSEND_BUNDLE fails with -EAGAIN/no data, the
> provided-buffer ring descriptor should not be persistently modified,
> or the original len should be restored during recycle/rollback.
>
> Actual behavior:
>
> The failed BUNDLE receive leaves entry0.len shortened to the requested
> length, and later unrelated operations using the same provided-buffer
> group consume that corrupted length.
>
> I can provide the minimal C reproducer and full output if useful.

Please do, no point in me recreating one for it. Then it can also get
turned into a regression test cor liburing. Reproducers also mean more
than a thousand words in an email, it tells us exactly what is bring run
and what is going wrong. Or in some cases, what the wrong expectations
are.

--
Jens Axboe