Re: [RFC PATCH v2 0/4] mm/zsmalloc: reduce zs_free() latency on swap release path

From: Wenchao Hao

Date: Sun Apr 26 2026 - 23:10:57 EST

On Sun, Apr 26, 2026 at 4:50 PM Xueyuan Chen <xueyuan.chen21@xxxxxxxxx> wrote:
>
>
> On Sun, Apr 26, 2026 at 12:13:02PM +0800, Wenchao Hao wrote:
>
> [...]
>
> >2. Per-cpu deferred free with lockless buffer swap
> >
> >Defer zs_free() to per-cpu dynamically-allocated buffers (~2048 entries).
> >Enqueue: one array write + WRITE_ONCE under preempt_disable — no lock,
> >no atomic. When buffers full, schedule a drain worker; overflow falls back
> >to sync zs_free().
> >
> >Drain: allocate a fresh buffer, swap it in, reset count. Since
> >the producer stops writing at count==SIZE, the handoff is
> >race-free without any lock.
> >
> >Pseudo-code:
> >
> > /* enqueue - hot path */
> > def = get_cpu_ptr(pool->deferred);
> > if (def->count < SIZE) {
> > def->handles[def->count] = handle;
> > WRITE_ONCE(def->count, def->count + 1);
> > if (def->count == SIZE)
> > schedule_work(&pool->drain_work);
> > } else {
> > zs_free(pool, handle); /* fallback */
> > }
> > put_cpu_ptr(pool->deferred);
> >
> > /* drain - worker */
> > for_each_possible_cpu(cpu) {
> > def = per_cpu_ptr(pool->deferred, cpu);
> > if (def->count < SIZE)
> > continue;
> > new_buf = kvmalloc_array(SIZE, sizeof(long));
> > old_buf = def->handles;
> > old_count = def->count;
> > def->handles = new_buf;
> > WRITE_ONCE(def->count, 0);
> > /* now drain old_buf[0..old_count-1] */
> > ...
> > kvfree(old_buf);
> > }
> >
>
> Hi Wenchao,
>
> I suspect there is a memory ordering issue here:
>
> def->handles = new_buf;
> WRITE_ONCE(def->count, 0);
>
> Since there are no explicit memory barriers, we cannot guarantee the
> order of these stores. If def->count is cleared to 0 first, an enqueue
> might end up operating on the old_buf.
>
> This race condition is more likely to be triggered when the size is
> smaller. Perhaps we should consider using smp_store_release() to enforce
> the ordering?
>

Hi Xueyuan,

Good catch! You are right — there is a memory ordering issue between
the handles pointer swap and the count reset.

I'll fix this in the next version by using smp_store_release() /
smp_load_acquire() pairs:

/* drain - worker */
def->handles = new_buf;
smp_store_release(&def->count, 0);

/* enqueue - producer */
count = smp_load_acquire(&def->count);
if (count < SIZE) {
def->handles[count] = handle;
smp_store_release(&def->count, count + 1);
}

This ensures the producer always observes the new handles pointer
before it sees count reset to 0. Will include this fix when posting
the formal patch series.

Thanks,
Wenchao

> Thanks
> Xueyuan