Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
From: Bernd Schubert
Date: Wed Mar 18 2026 - 17:52:25 EST
Hi Joanne,
On 3/18/26 22:19, Joanne Koong wrote:
> On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@xxxxxxxxxxxxx> wrote:
>>
>> Hi Joanne,
>>
>> I wonder, would something like this help for large folios?
>
> Hi Horst,
>
> I don't think it's likely that the pages backing the userspace buffer
> are large folios, so I think this may actually add extra overhead with
> the extra folio_test_dirty() check.
>
> From what I've seen, the main cost that dwarfs everything else for
> writes/reads is the actual IO, the context switches, and the memcpys.
> I think compared to these things, the set_page_dirty_lock() cost is
> negligible and pretty much undetectable.
a little bit background here. We see in cpu flame graphs that the spin
lock taken in unlock_request() and unlock_request() takes about the same
amount of CPU time as the memcpy. Interestingly, only on Intel, but not
AMD CPUs. Note that we are running with out custom page pinning, which
just takes the pages from an array, so iov_iter_get_pages2() is not used.
The reason for that unlock/lock is documented at the end of
Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we
don't have that, so for now these checks are modified in our branches to
avoid the lock. Although that is not upstreamable. Right solution is
here to extract an array of pages and do that unlock/lock per pagevec.
Next in the flame graph is setting that set_page_dirty_lock which also
takes as much CPU time as the memcpy. Again, Intel CPUs only.
In the combination with the above pagevec method, I think right solution
is to iterate over the pages, stores the last folio and then set to
dirty once per folio.
Also, I disagree about that the userspace buffers are not likely large
folios, see commit
59ba47b6be9cd0146ef9a55c6e32e337e11e7625 "fuse: Check for large folio)
with SPLICE_F_MOVE". Especially Horst persistently runs into it when
doing xfstests with recent kernels. I think the issue came up first time
with 3.18ish.
One can further enforce that by setting
"/sys/kernel/mm/transparent_hugepage/enabled" to 'always', what I did
when I tested the above commit. And actually that points out that
libfuse allocations should do the madvise. I'm going to do that during
the next days, maybe tomorrow.
Thanks,
Bernd