Re: [PATCH 1/3] mm/mremap: correct invalid map count check

From: Pedro Falcato

Date: Fri Mar 27 2026 - 05:20:05 EST


On Wed, Mar 11, 2026 at 05:24:36PM +0000, Lorenzo Stoakes (Oracle) wrote:
> We currently check to see, if on moving a VMA when doing mremap(), if it
> might violate the sys.vm.max_map_count limit.
>
> This was introduced in the mists of time prior to 2.6.12.
>
> At this point in time, as now, the move_vma() operation would copy the
> VMA (+1 mapping if not merged), then potentially split the source VMA upon
> unmap.
>
> Prior to commit 659ace584e7a ("mmap: don't return ENOMEM when mapcount is
> temporarily exceeded in munmap()"), a VMA split would check whether
> mm->map_count >= sysctl_max_map_count prior to a split before it ran.
>
> On unmap of the source VMA, if we are moving a partial VMA, we might split
> the VMA twice.
>
> This would mean, on invocation of split_vma() (as was), we'd check whether
> mm->map_count >= sysctl_max_map_count with a map count elevated by one,
> then again with a map count elevated by two, ending up with a map count
> elevated by three.
>
> At this point we'd reduce the map count on unmap.
>
> At the start of move_vma(), there was a check that has remained throughout
> mremap()'s history of mm->map_count >= sysctl_max_map_count - 3 (which
> implies mm->mmap_count + 4 > sysctl_max_map_count - that is, we must have
> headroom for 4 additional mappings).
>
> After mm->map_count is elevated by 3, it is decremented by one once the
> unmap completes. The mmap write lock is held, so nothing else will observe
> mm->map_count > sysctl_max_map_count.
>
> It appears this check was always incorrect - it should have either be one
> of 'mm->map_count > sysctl_max_map_count - 3' or 'mm->map_count >=
> sysctl_max_map_count - 2'.
>
> After commit 659ace584e7a ("mmap: don't return ENOMEM when mapcount is
> temporarily exceeded in munmap()"), the map count check on split is
> eliminated in the newly introduced __split_vma(), which the unmap path
> uses, and has that path check whether mm->map_count >=
> sysctl_max_map_count.
>
> This is valid since, net, an unmap can only cause an increase in map count
> of 1 (split both sides, unmap middle).
>
> Since we only copy a VMA and (if MREMAP_DONTUNMAP is not set) unmap
> afterwards, the maximum number of additional mappings that will actually be
> subject to any check will be 2.
>
> Therefore, update the check to assert this corrected value. Additionally,
> update the check introduced by commit ea2c3f6f5545 ("mm,mremap: bail out
> earlier in mremap_to under map pressure") to account for this.
>
> While we're here, clean up the comment prior to that.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@xxxxxxxxxx>

Reviewed-by: Pedro Falcato <pfalcato@xxxxxxx>

--
Pedro