Re: [PATCH 10/17] jbd2: replace __get_free_pages() with kmalloc()

From: Matthew Wilcox

Date: Thu Jun 04 2026 - 11:08:23 EST


I'm hoping you'll take my "Remove special jbd2 slabs" patch instead of
this one, but answering here anyway ...

On Thu, Jun 04, 2026 at 10:05:52AM -0400, Theodore Tso wrote:
> On Thu, Jun 04, 2026 at 09:14:57AM +0300, Mike Rapoport wrote:
> > There's no memory overhead when order == 1.
> > As for the CPU overhead, the difference for the fast path allocations is
> > not measurable and for the slow path it is anyway determined by the amount
> > of reclaim involved rather than by what allocator is used.
>
> Thanks for confirming!
>
> > Larger allocations (> PAGE_SIZE * 2) go straight to the page allocator.

That is a detail subject to change. I have some ideas ...

What users are guaranteed is that kmalloc returns physically contiguous
memory. And that if it's a power-of-two that it's naturally aligned.

> Another question: Today, we can either use kmalloc() (or
> __get_free_pages, previously) or vmalloc(). Is there a way a file
> system can say, "give me physically contiguous pages if possible, but
> if it's too hard --- with some TBD to specify what 'too hard' means or
> can be specified --- fall back to a vmalloc-style approach, with the
> page table / TLB overhead that this might imply"?
>
> I suppose we could do it with kmalloc() with some flags which to
> prevent forced reclaim / compaction, and if that fails, then fall back
> to vmalloc(). Is there a better way?

I think we'd like to avoid doing that. A lot of code has various
workarounds for deficiencies in the memory allocator (some of which have
been fixed and thus the workarounds only complicate matters). If the
memory allocator(s) aren't providing what you need (be it performance
under load, fragmentation avoidance or whatever), it's best to get that
fixed rather than having fallback paths.

There have been people who have suggested "What if folios could be
physically discontiguous", and sometimes I've hhumoured them, but the
simplifications enabled by requiring folios to be contiguous are quite
immense.

We've been trying to move in the direction of exposing more high-level
APIs so people can say "I want to allocate 10MB of memory but it doesn't
need to be contiguous" and have the allocator either fail the whole
thing up front or make efforts to ensure that you get the whole 10MB.
It's a lot more efficient than calling get_free_page() 2500 times
and possibly having reclaim run a dozen different times.

(anyone else try to create a brd that's actually larger than system ram?
;-)