Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring

From: Mina Almasry
Date: Mon May 26 2025 - 13:51:57 EST


)

On Mon, May 26, 2025 at 7:47 AM dongchenchen (A)
<dongchenchen2@xxxxxxxxxx> wrote:
>
>
> > On Fri, May 23, 2025 at 1:31 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:
> >> On 2025/5/23 14:45, Dong Chenchen wrote:
> >>
> >>> static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
> >>> {
> >>> + bool in_softirq;
> >>> int ret;
> >> int -> bool?
> >>
> >>> /* BH protection not needed if current is softirq */
> >>> - if (in_softirq())
> >>> - ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
> >>> - else
> >>> - ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
> >>> -
> >>> - if (!ret) {
> >>> + in_softirq = page_pool_producer_lock(pool);
> >>> + ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
> >>> + if (ret)
> >>> recycle_stat_inc(pool, ring);
> >>> - return true;
> >>> - }
> >>> + page_pool_producer_unlock(pool, in_softirq);
> >>>
> >>> - return false;
> >>> + return ret;
> >>> }
> >>>
> >>> /* Only allow direct recycling in special circumstances, into the
> >>> @@ -1091,10 +1088,14 @@ static void page_pool_scrub(struct page_pool *pool)
> >>>
> >>> static int page_pool_release(struct page_pool *pool)
> >>> {
> >>> + bool in_softirq;
> >>> int inflight;
> >>>
> >>> page_pool_scrub(pool);
> >>> inflight = page_pool_inflight(pool, true);
> >>> + /* Acquire producer lock to make sure producers have exited. */
> >>> + in_softirq = page_pool_producer_lock(pool);
> >>> + page_pool_producer_unlock(pool, in_softirq);
> >> Is a compiler barrier needed to ensure compiler doesn't optimize away
> >> the above code?
> >>
> > I don't want to derail this conversation too much, and I suggested a
> > similar fix to this initially, but now I'm not sure I understand why
> > it works.
> >
> > Why is the existing barrier not working and acquiring/releasing the
> > producer lock fixes this issue instead? The existing barrier is the
> > producer thread incrementing pool->pages_state_release_cnt, and
> > page_pool_release() is supposed to block the freeing of the page_pool
> > until it sees the
> > `atomic_inc_return_relaxed(&pool->pages_state_release_cnt);` from the
> > producer thread. Any idea why this barrier is not working? AFAIU it
> > should do the exact same thing as acquiring/dropping the producer
> > lock.
>
> Hi, Mina
> As previously mentioned:
> page_pool_recycle_in_ring
> ptr_ring_produce
> spin_lock(&r->producer_lock);
> WRITE_ONCE(r->queue[r->producer++], ptr)
> //recycle last page to pool, producer + release_cnt = hold_cnt

This is not right. release_cnt != hold_cnt at this point.

Release_cnt is only incremented by the producer _after_ the
spin_unlock and the recycle_stat_inc have been done. The full call
stack on the producer thread:

page_pool_put_unrefed_netmem
page_pool_recycle_in_ring
ptr_ring_produce(&pool->ring, (__force void *)netmem);
spin_lock(&r->producer_lock);
__ptr_ring_produce(r, ptr);
spin_unlock(&r->producer_lock);
recycle_stat_inc(pool, ring);
recycle_stat_inc(pool, ring_full);
page_pool_return_page
atomic_inc_return_relaxed(&pool->pages_state_release_cnt);

The atomic_inc_return_relaxed happens after all the lines that could
cause UAF are already executed. Is it because we're using the _relaxed
version of the atomic operation, that the compiler can reorder it to
happen before the spin_unlock(&r->producer_lock) and before the
recycle_stat_inc...?

--
Thanks,
Mina