Re: [PATCH v6 2/2] mm: kick writeback flusher for IOCB_DONTCACHE with targeted dirty tracking

From: Jeff Layton

Date: Sat May 09 2026 - 06:19:55 EST

On Fri, 2026-05-08 at 17:20 -0700, Andrew Morton wrote:
> On Tue, 05 May 2026 20:59:49 +0200 Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> > The IOCB_DONTCACHE writeback path in generic_write_sync() calls
> > filemap_flush_range() on every write, submitting writeback inline in
> > the writer's context. Perf lock contention profiling shows the
> > performance problem is not lock contention but the writeback submission
> > work itself — walking the page tree and submitting I/O blocks the writer
> > for milliseconds, inflating p99.9 latency from 23ms (buffered) to 93ms
> > (dontcache).
> >
> > Replace the inline filemap_flush_range() call with a flusher kick that
> > drains dirty pages in the background. This moves writeback submission
> > completely off the writer's hot path.
> >
> > ...
> >
> > Before After Change
> > seq-write/dontcache 298 897 +201%
> > rand-write/dontcache 131 236 +80%
> >
> > Tail latency improvements (seq-write/dontcache):
> > p99: 135,266 us -> 23,986 us (-82%)
> > p99.9: 8,925,479 us -> 28,443 us (-99.7%)
> >
> > Multi-writer (4 jobs, sequential write):
> > Before After Change
> > dontcache aggregate (MB/s) 2,529 4,532 +79%
> > dontcache p99 (us) 8,553 1,002 -88%
> > dontcache p99.9 (us) 109,314 1,057 -99%
> >
> > 32-file write (Axboe test):
> > Before After Change
> > dontcache aggregate (MB/s) 1,548 3,499 +126%
> > dontcache p99 (us) 10,170 602 -94%
> > Peak dirty pages (MB) 1,837 213 -88%
> >
> > Dontcache now reaches 81% of buffered throughput (was 35%).
> >
> > Competing writers (dontcache vs buffered, separate files):
> > Before After
> > buffered writer 868 433 MB/s
> > dontcache writer 415 433 MB/s
> > Aggregate 1,284 866 MB/s
> >
> > ...
> >
> > The dontcache writer's p99.9 latency collapsed from 119 ms to
> > 33 ms (-73%), eliminating the severe periodic stalls seen in the
> > baseline. Both writers now share identical latency profiles,
> > matching the buffered-vs-buffered pattern.
> >
> > The per-bdi_writeback dirty tracking dramatically reduces peak dirty
> > pages in dontcache workloads, with the 32-file test dropping from
> > 1.8 GB to 213 MB. Dontcache sequential write throughput triples and
> > multi-writer throughput reaches parity with buffered I/O, with tail
> > latencies collapsing by 1-2 orders of magnitude.
>
> Geeze, is that the best you can do ;)
>
> Sashiko seems to have found more stuff:
> https://sashiko.dev/#/patchset/20260505-dontcache-v6-0-66463805dd6a@xxxxxxxxxx

I saw those after I sent the last set and have been working on
addressing them. I've also found a couple more via dueling Gemini and
Claude reviews.

I'll have a v7 posting coming early next week.

Thanks,
--
Jeff Layton <jlayton@xxxxxxxxxx>