Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd

From: Ackerley Tng
Date: Fri May 16 2025 - 18:43:31 EST


Ackerley Tng <ackerleytng@xxxxxxxxxx> writes:

> <snip>
>
> Here are some remaining issues/TODOs:
>
> 1. Memory error handling such as machine check errors have not been
> implemented.
> 2. I've not looked into preparedness of pages, only zeroing has been
> considered.
> 3. When allocating HugeTLB pages, if two threads allocate indices
> mapping to the same huge page, the utilization in guest_memfd inode's
> subpool may momentarily go over the subpool limit (the requested size
> of the inode at guest_memfd creation time), causing one of the two
> threads to get -ENOMEM. Suggestions to solve this are appreciated!
> 4. max_usage_in_bytes statistic (cgroups v1) for guest_memfd HugeTLB
> pages should be correct but needs testing and could be wrong.
> 5. memcg charging (charge_memcg()) for cgroups v2 for guest_memfd
> HugeTLB pages after splitting should be correct but needs testing and
> could be wrong.
> 6. Page cache accounting: When a hugetlb page is split, guest_memfd will
> incur page count in both NR_HUGETLB (counted at hugetlb allocation
> time) and NR_FILE_PAGES stats (counted when split pages are added to
> the filemap). Is this aligned with what people expect?
>

For people who might be testing this series with non-Coco VMs (heads up,
Patrick and Nikita!), this currently splits the folio as long as some
shareability in the huge folio is shared, which is probably unnecessary?

IIUC core-mm doesn't support mapping at 1G but from a cursory reading it
seems like the faulting function calling kvm_gmem_fault_shared() could
possibly be able to map a 1G page at 4K.

Looks like we might need another flag like
GUEST_MEMFD_FLAG_SUPPORT_CONVERSION, which will gate initialization of
the shareability maple tree/xarray.

If shareability is NULL for the entire hugepage range, then no splitting
will occur.

For Coco VMs, this should be safe, since if this flag is not set,
kvm_gmem_fault_shared() will always not be able to fault (the
shareability value will be NULL.

> Here are some optimizations that could be explored in future series:
>
> 1. Pages could be split from 1G to 2M first and only split to 4K if
> necessary.
> 2. Zeroing could be skipped for Coco VMs if hardware already zeroes the
> pages.
>
> <snip>