[PATCH] fuse: allow server to increase max_readahead via FUSE_INIT reply
From: Jim Harris
Date: Tue Jun 02 2026 - 17:15:00 EST
A FUSE server that advertises a large max_pages and max_write (e.g.
max_pages=256, max_write=1MB) cannot currently obtain matching
FUSE_READ request sizes from the kernel. Buffered sequential writes
arrive at the server at the negotiated max_write size, but buffered
sequential reads remain capped at the kernel's default readahead
window (VM_READAHEAD_PAGES, 128KB; doubled to 256KB for files marked
POSIX_FADV_SEQUENTIAL). A 1MB application read() therefore turns
into four sequential 256KB FUSE_READ round-trips instead of one.
This is because process_init_reply() processes the server's
max_readahead response as:
ra_pages = arg->max_readahead / PAGE_SIZE;
fm->sb->s_bdi->ra_pages = min(fm->sb->s_bdi->ra_pages, ra_pages);
Since the kernel sends its current bdi->ra_pages as
init_in->max_readahead, and bdi->ra_pages is the default
VM_READAHEAD_PAGES at this point, the server can only ever decrease
the readahead window -- never increase it. Even if the server
replies with max_readahead=1MB, the min() clamps it back to 128KB.
This clamp dates to commit 9cd684551124 ("[PATCH] fuse: fix async
read for legacy filesystems"), which introduced max_readahead at FUSE
protocol 7.6 and used min() to preserve legacy (<7.6) filesystem
behaviour. Modern filesystems that explicitly advertise a larger
max_readahead are silently overridden.
Other filesystems set ra_pages or io_pages directly from negotiated
server/device capabilities: cifs sets ra_pages from rsize/rasize,
ceph from rasize/rsize mount options, 9p from maxdata, and nfs sets
io_pages from rpages.
Use the server's max_readahead response directly, bounded by
fc->max_pages (which is itself bounded by fc->max_pages_limit and,
for virtio-fs, by the virtqueue descriptor count):
fm->sb->s_bdi->ra_pages = min_t(unsigned int, ra_pages,
fc->max_pages);
This is backward compatible:
- Servers that echo init_in->max_readahead back unchanged see the
same effective readahead as today.
- Servers that reply with a smaller value still reduce ra_pages.
- Servers that do not negotiate FUSE_MAX_PAGES see no change, since
fc->max_pages defaults to FUSE_DEFAULT_MAX_PAGES_PER_REQ (32),
matching VM_READAHEAD_PAGES.
- Only servers that both negotiate FUSE_MAX_PAGES and advertise a
larger max_readahead see the new behaviour, and in that case
fc->max_pages already gates per-request data size.
Signed-off-by: Jim Harris <jim.harris@xxxxxxxxxx>
Assisted-by: Cursor:claude-opus-4.7
---
Notes on AI assistance:
The code analysis (tracing the readahead negotiation in
process_init_reply(), confirming the behaviour of ractl_max_pages()
in mm/readahead.c, and surveying how other filesystems set
ra_pages/io_pages) and the bulk of this changelog were drafted with
an AI coding assistant (see Assisted-by trailer). The one-line code
change was reviewed by me. The motivating performance observation
(a 1MB application read producing four 256KB FUSE_READ requests
against a server advertising max_pages=256 and max_write=1MB) was
observed by me on a real virtio-fs workload prior to any AI
involvement, and verification of patched and unpatched behaviour
was performed by me.
fs/fuse/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index deddfffb037f..272026f11a34 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1494,7 +1494,7 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
init_server_timeout(fc, timeout);
fm->sb->s_bdi->ra_pages =
- min(fm->sb->s_bdi->ra_pages, ra_pages);
+ min_t(unsigned int, ra_pages, fc->max_pages);
fc->minor = arg->minor;
fc->max_write = arg->minor < 5 ? 4096 : arg->max_write;
fc->max_write = max_t(unsigned, 4096, fc->max_write);
--
2.43.0