Re: [PATCH] locking/rwsem: Remove reader optimistic lock stealing

From: Peng Wang

Date: Fri May 22 2026 - 06:08:49 EST


On Fri, May 22, 2026 at 10:55:13AM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2026 at 10:08:58PM -0400, Waiman Long wrote:
> > On 5/21/26 5:59 AM, Peng Wang wrote:
> > > Reader optimistic lock stealing, introduced by commit 1a728dff855a
> > > ("locking/rwsem: Enable reader optimistic lock stealing") and made more
> > > aggressive by commit 617f3ef95177 ("locking/rwsem: Remove reader
> > > optimistic spinning"), allows a reader entering the slowpath to bypass
> > > the wait queue and acquire the lock directly when WRITER_LOCKED and
> > > HANDOFF bits are not set.
> > >
> > > This causes severe writer starvation in workloads where readers hold
> > > the lock for extended periods, such as Direct I/O operations which
> > > hold inode->i_rwsem for the entire duration of iomap_dio_rw(). A
> > > common example is log-structured storage where one thread appends via
> > > DIO writes while another thread tails the log via DIO reads -- a
> > > pattern seen in database redo-log replay and shared-storage
> > > replication.
> >
> > It is generally assume that reader lock critical section is shorter than
> > that of writer. In this particular case, does the reader critical section
> > run longer than the writer's one?
>
> Well, that and writers are assumed to be rare. Reader-writer setups
> where writers are common or even dominant make little sense. And that
> seems to be exactly this. Then again, it isn't unreasonable to expect it
> to not perform significantly worse than an exclusive lock.
>
> > Reader lock stealing should only happen if the previous lock owner is a
> > writer. So readers and writer should at most alternately own the lock if
> > there are many readers waiting. Of course, if a reader own the lock, it will
> > wake up the remaining readers in the wait queue.
>
> Anyway, IIRC I've mentioned phase change locks many times before. And
> what we have here is an asymmetric phase change. The timeout causes a
> change to writers, but any one writer completing then switches back to
> reader dominance.
>
> Perhaps look at evening out the phase change. Retain the 'no-steal'
> phase for an equal duration.
>
> Also, 4ms is an eternity, that might need tweaking too.

Hi Peter,

I agree this is an asymmetric phase change. Looking at the code, the asymmetry
appears to exist only through the steal path, which is only reachable when
RWSEM_FLAG_WAITERS triggers slowpath entry (since READ_FAILED_MASK includes WAITERS,
and the steal condition requires WRITER_LOCKED=0 and HANDOFF=0).

With steal removed, the wait queue with phase-fair batch wakeup seems to provide naturally
symmetric transitions without explicit phase management:
- Writer at queue head -> writer runs
- Reader at queue head -> all readers batch-woken in parallel
- Alternation governed by arrival order

A read-heavy mmap benchmark (16 threads, short critical sections) showed no measurable difference
with steal removed (308k vs 306k ops/sec), as readers succeed via fast path when there is no contention.

Regarding the timeout: agreed 4ms feels too long for fast storage.
With steal removed, handoff would only matter as a backstop when a writer enters the queue while readers are already active.

Would the simple removal be acceptable, or would you prefer a more structured phase-change approach?

Best regards,
Peng