Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd

From: Sean Christopherson
Date: Fri May 16 2025 - 13:52:05 EST


On Fri, May 16, 2025, Rick P Edgecombe wrote:
> On Fri, 2025-05-16 at 06:11 -0700, Vishal Annapurve wrote:
> > Google internally uses 1G hugetlb pages to achieve high bandwidth IO,
> > lower memory footprint using HVO and lower MMU/IOMMU page table memory
> > footprint among other improvements. These percentages carry a
> > substantial impact when working at the scale of large fleets of hosts
> > each carrying significant memory capacity.
>
> There must have been a lot of measuring involved in that. But the numbers I was
> hoping for were how much does *this* series help upstream.

...

> I asked this question assuming there were some measurements for the 1GB part of
> this series. It sounds like the reasoning is instead that this is how Google
> does things, which is backed by way more benchmarking than kernel patches are
> used to getting. So it can just be reasonable assumed to be helpful.
>
> But for upstream code, I'd expect there to be a bit more concrete than "we
> believe" and "substantial impact". It seems like I'm in the minority here
> though. So if no one else wants to pressure test the thinking in the usual way,
> I guess I'll just have to wonder.

>From my perspective, 1GiB hugepage support in guest_memfd isn't about improving
CoCo performance, it's about achieving feature parity on guest_memfd with respect
to existing backing stores so that it's possible to use guest_memfd to back all
VM shapes in a fleet.

Let's assume there is significant value in backing non-CoCo VMs with 1GiB pages,
unless you want to re-litigate the existence of 1GiB support in HugeTLBFS.

If we assume 1GiB support is mandatory for non-CoCo VMs, then it becomes mandatory
for CoCo VMs as well, because it's the only realistic way to run CoCo VMs and
non-CoCo VMs on a single host. Mixing 1GiB HugeTLBFS with any other backing store
for VMs simply isn't tenable due to the nature of 1GiB allocations. E.g. grabbing
sub-1GiB chunks of memory for CoCo VMs quickly fragments memory to the point where
HugeTLBFS can't allocate memory for non-CoCo VMs.

Teaching HugeTLBFS to play nice with TDX and SNP isn't happening, which leaves
adding 1GiB support to guest_memfd as the only way forward.

Any boost to TDX (or SNP) performance is purely a bonus.