Re: [PATCH v3] iomap: add allocation cache for iomap_dio

From: changfengnan

Date: Tue Mar 17 2026 - 04:34:22 EST



> From: "Vlastimil Babka (SUSE)"<vbabka@xxxxxxxxxx>
> Date:  Tue, Mar 17, 2026, 16:28
> Subject:  Re: [PATCH v3] iomap: add allocation cache for iomap_dio
> To: "changfengnan"<changfengnan@xxxxxxxxxxxxx>
> Cc: "Dave Chinner"<david@xxxxxxxxxxxxx>, "Harry Yoo"<harry.yoo@xxxxxxxxxx>, "Hao Li"<hao.li@xxxxxxxxx>, "guzebing"<guzebing1612@xxxxxxxxx>, <brauner@xxxxxxxxxx>, <djwong@xxxxxxxxxx>, <hch@xxxxxxxxxxxxx>, <linux-xfs@xxxxxxxxxxxxxxx>, <linux-fsdevel@xxxxxxxxxxxxxxx>, <linux-kernel@xxxxxxxxxxxxxxx>, <guzebing@xxxxxxxxxxxxx>, <syzbot@xxxxxxxxxxxxxxxxxxxxxxxxx>, <linux-mm@xxxxxxxxx>
> On 3/17/26 08:28, changfengnan wrote:
>
> >> That suggests in that test you used larger capacity than the automatically
> >> calculated.
> > The 10% improvement is due to the every cache has sheaves.
> > When I tested 256-byte objects, default sheaf_capacity is 26, allocating and
> > freeing 32 objects did not show a noticeable difference, but allocating and
> > freeing 128 objects resulted in a significant improvement, about 3-4x in a 
> > multithreaded environment.  about 12% improvement in single thread.
> 
> Great!
> 
> >>  
> >> > I'm thinking that maybe these improvements may not be significant enough to
> >> > see the effect in the io flow.
> >> > Using a simple list seems to be the most efficient approach.
> >> 
> >> I think the question is, what improvement do you now see with your added
> >> pcpu cache vs kmalloc() when 7.0-rc4 is used as the baseline?
>
> > On 7.0-rc4, pcpu get 1.20M IOPS , kmalloc get 1.19M IOPS, new cache with set sheaf_capacity 256, 1.19M IOPS
> > On 6.19, pcpu get 1.20M IOPS,  kmalloc get 1.17M IOPS, new cache with set sheaf_capacity 256, 1.19M IOPS.
> 
> Thanks a lot for that data. My conclusion is that kmalloc before sheaves did
> indeed worse and custom pcpu cache improved it relatively more. Kmalloc with
> sheaves does better, and the improvement of custom pcpu cache is smaller.
> Also the default sheaf capacity seems to be enough for this workload.
Agree.
> 
> IO is not my area but getting from 1.19M to 1.20M doesn't look like it's
> worth the custom code? (possibly from 1.17M to 1.20M it also wasn't).
Yes, at least for now, there’s no need for a per-CPU.
It might be better to replace kmalloc with a new cache, but my tests so far
haven’t shown any performance improvements.  I’ll look into it further.

>