Re: [PATCH v2 2/4] mm: replace exec_folio_order() with generic preferred_exec_order()

From: Usama Arif

Date: Thu Mar 26 2026 - 08:48:07 EST

On 20/03/2026 17:42, Jan Kara wrote:
> On Fri 20-03-26 06:58:52, Usama Arif wrote:
>> Replace the arch-specific exec_folio_order() hook with a generic
>> preferred_exec_order() that dynamically computes the readahead folio
>> order for executable memory. It targets min(PMD_ORDER, 2M) as the
>> maximum, which optimally gives the right answer for contpte (arm64),
>> PMD mapping (x86, arm64 4K), and architectures with smaller PMDs
>> (s390 1M). It adapts at runtime based on:
>>
>> - VMA size: caps the order so folios fit within the mapping
>> - Memory pressure: steps down the order when the local node's free
>> memory is below the high watermark for the requested order
>>
>> This avoids over-allocating on memory-constrained systems while still
>> requesting the optimal order when memory is plentiful.
>>
>> Since exec_folio_order() is no longer needed, remove the arm64
>> definition and the generic default from pgtable.h.
>>
>> Signed-off-by: Usama Arif <usama.arif@xxxxxxxxx>
> ...
>> +static unsigned int preferred_exec_order(struct vm_area_struct *vma)
>> +{
>> + int order;
>> + unsigned long vma_len = vma_pages(vma);
>> + struct zone *zone;
>> + gfp_t gfp;
>> +
>> + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
>> + return 0;
>> +
>> + /* Cap at min(PMD_ORDER, 2M) */
>> + order = min(HPAGE_PMD_ORDER, ilog2(SZ_2M >> PAGE_SHIFT));
>> +
>> + /* Don't request folios larger than the VMA */
>> + order = min(order, ilog2(vma_len));
>

Hi Jan,

Thanks for the feedback and sorry for the late reply! I was travelling
during the week.

> Hum, as far as I'm checking page_cache_ra_order() used in
> do_sync_mmap_readahead(), ra->order is the preferred order but it will be
> trimmed down to fit both within the file and within ra->size. And ra->size
> is set for the readahead to fit within the vma so I don't think any order
> trimming based on vma length is needed in this place?

Ack, yes makes sense.

>
>> + /* Step down under memory pressure */
>> + gfp = mapping_gfp_mask(vma->vm_file->f_mapping);
>> + zone = first_zones_zonelist(node_zonelist(numa_node_id(), gfp),
>> + gfp_zone(gfp), NULL)->zone;
>> + if (zone) {
>> + while (order > 0 &&
>> + !zone_watermark_ok(zone, order,
>> + high_wmark_pages(zone), 0, 0))
>> + order--;
>> + }
>
> It looks wrong for this logic to be here. Trimming order based on memory
> pressure makes sense (and we've already got reports that on memory limited
> devices large order folios in the page cache have too big memory overhead
> so we'll likely need to handle that for page cache allocations in general)
> but IMHO it belongs to page_cache_ra_order() or some other common place
> like that.
>
> Honza

So I have been thinking about this. readahead_gfp_mask() already sets
__GFP_NORETRY, so we wont try aggressive reclaim/compaction to satisfy
the allocation. page_cache_ra_order() falls through to the fallback path
faulting in order 0 page when allocation is not satsified.

So the allocator already naturally steps down under memory pressure,
the explicit zone_watermark_ok() loop might be redundant?

What are your thoughts on just setting
ra->order = min(HPAGE_PMD_ORDER, ilog2(SZ_2M >> PAGE_SHIFT))?
We can do the higher orlder allocation with gfp &= ~__GFP_RECLAIM
for the VM_EXEC case.