Re: [PATCH RFC v2] mm/shmem: set __GFP_SKIP_KASAN for swap_cluster_readahead

From: Chia-I Wu

Date: Thu May 21 2026 - 17:12:36 EST


On Thu, May 21, 2026 at 8:49 AM Chia-I Wu <olvaffe@xxxxxxxxx> wrote:
>
> On Thu, May 21, 2026 at 1:51 AM Boris Brezillon
> <boris.brezillon@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, 21 May 2026 15:05:21 +0800
> > Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> wrote:
> >
> > > On 5/21/26 1:06 AM, Chia-I Wu wrote:
> > > > On Wed, May 20, 2026 at 3:04 AM Baolin Wang
> > > > <baolin.wang@xxxxxxxxxxxxxxxxx> wrote:
> > > >>
> > > >> CC Kairui,
> > > >>
> > > >> On 5/20/26 12:31 PM, Chia-I Wu via B4 Relay wrote:
> > > >>> From: Chia-I Wu <olvaffe@xxxxxxxxx>
> > > >>>
> > > >>> swap_cluster_readahead can allocate folios for other mappings. If the
> > > >>> gfp flags do not have __GFP_SKIP_KASAN, but the other mappings have
> > > >>> PROT_MTE, we can end up with false KASAN errors such as
> > > >>>
> > > >>> BUG: KASAN: invalid-access in swap_writepage+0xb0/0x21c
> > > >>> Read at addr f5ffff81aa71dff8 by task WM.task-4/6956
> > > >>> Pointer tag: [f5], memory tag: [f9]
> > > >>>
> > > >>> In the above example, because __GFP_SKIP_KASAN was missing, KASAN set
> > > >>> both pointer tag and memory tag to 0xf5 when swap_cluster_readahead
> > > >>> allocated the folio. But the userspace had already set the memory tag to
> > > >>> 0xf9 before swapped out. arch_swap_restore restored the memory tag back
> > > >>> to 0xf9, leading to the mismatch.
> > > >>>
> > > >>> Signed-off-by: Chia-I Wu <olvaffe@xxxxxxxxx>
> > > >>> ---
> > > >>> Changes in v2:
> > > >>> - set __GFP_SKIP_KASAN for shmem instead of drm/panthor
> > > >>> - Link to v1: https://patch.msgid.link/20260512-panthor-kasan-v1-1-d8d3e275d71b@xxxxxxxxx
> > > >>> ---
> > > >>> mm/shmem.c | 5 +++++
> > > >>> 1 file changed, 5 insertions(+)
> > > >>>
> > > >>> diff --git a/mm/shmem.c b/mm/shmem.c
> > > >>> index 3b5dc21b323c2..db9130a8c5b76 100644
> > > >>> --- a/mm/shmem.c
> > > >>> +++ b/mm/shmem.c
> > > >>> @@ -1784,6 +1784,11 @@ static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp,
> > > >>> pgoff_t ilx;
> > > >>> struct folio *folio;
> > > >>>
> > > >>> + /* swap_cluster_readahead might cross the mapping boundary and
> > > >>> + * allocate pages for other mappings. We have to skip KASAN.
> > > >>> + */
> > > >>> + gfp |= __GFP_SKIP_KASAN;
> > > >>> +
> > > >>> mpol = shmem_get_pgoff_policy(info, index, 0, &ilx);
> > > >>> folio = swap_cluster_readahead(swap, gfp, mpol, ilx);
> > > >>> mpol_cond_put(mpol);
> > > >>
> > > >> If we force __GFP_SKIP_KASAN, would this cause issues for mappings that
> > > >> explicitly should NOT have the flag? and your v1 link already mentions
> > > >> this scenario.
> > > > We lose the benefits of kasan hw tags (other modes are not affected)
> > > > by forcing the flag.
> > > >
> > > > The other mappings swap_cluster_readahead can affect are anon
> > > > mappings, regular shmem mappings, or gpu shmem mappings. I think only
> > > > gpu shmem mappings miss __GFP_SKIP_KASAN. That might not even be
> > > > intentional, because gpu shmem mappings pick GFP_HIGHUSER over
> > > > GFP_HIGHUSER_MOVABLE to avoid __GFP_MOVABLE. That was before
> > > > __GFP_SKIP_KASAN was added to GFP_HIGHUSER_MOVABLE.
> > >
> > > It sounds like the right approach would be to explicitly set
> > > __GFP_SKIP_KASAN for GPU shmem mappings, no? I think having users
> > > explicitly set __GFP_SKIP_KASAN makes the implications clearer than
> > > having shmem core set it implicitly.
> >
> > It's a bit of a shame that we have to explicitly set this
> > __GFP_SKIP_KASAN flag when we select GFP_HIGHUSER though (means a lot
> > of patching to do in drivers/gpu/drm/ basically, because basically
> > every driver relying on shmem for its buffer allocation uses this flag).
> >
> > Also, it feels like KASAN poisoning for these pages would be interesting
> > to have since we know we won't allow MTE_PROT on userspace mappings
> > anyway. Oh, and some buffers might even be kernel only (no mmap()
> > allowed), which makes them even better candidates for poisoning.
> >
> > >
> > > We could also consider adding a VM_WARN in shmem_swapin_cluster() to
> > > detect any mappings missing the __GFP_SKIP_KASAN flag.
> >
> > If the general consensus is that all shmem-backed allocation must have
> > __GFP_SKIP_KASAN, yes, it'd make sense to add a VM_WARN.
> >
> > >
> > > > I guess what I am trying to say is these are all user pages. We have
> > > > to skip kasan when user pages can be mapped PROT_MTE. The
> > >
> > > Yes, regular shmem mappings typically default to GFP_HIGHUSER_MOVABLE,
> > > while GPU shmem mappings are a special case.
> >
> > They are not that special, they are just not MOVABLE because the GPU
> > might also access the same pages under the hood. If it's assumed that
> > any page being exposed through mmap() must have __GFP_SKIP_KASAN, why
> > does GFP_HIGHUSER not have that flag too?
> It is also about whether PROT_MTE is allowed. This becomes a problem
> when both kernel and userspace want to modify the tags stored in MTE.
>
> Another way to achieve the same effect as this patch, but is more
> explicit, is to have
>
> #define GFP_HIGHUSER_SWAPPABLE (GFP_HIGHUSER | __GFP_SKIP_KASAN)
> #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER_SWAPPABLE | __GFP_MOVABLE)
>
> GPU drivers that can swap should use GFP_HIGHUSER_SWAPPABLE. shmem
> core can warn about missing __GFP_SKIP_KASAN.
>
> >
> > >
> > > > justification for gpu shmem mappings is that they cannot be mapped
> > > > PROT_MTE. But if readahead can affect non-gpu shmem mappings, it seems
> > > > we have to either force __GFP_SKIP_KASAN or to cap/disable readahead.
> >
> > I'm no MM expert, so it's probably me not understanding how this
> > swap-readahead logic is supposed to work, but the whole idea of using
> > different flags from those that were requested by the f_mapping seems
> > fragile. I mean, this comments proves [1] it's not the first time the
> > problem is considered, and I'm wondering why __GFP_SKIP_KASAN should be
> > treated differently from zones. Yes, that's an extra copy if the
> > SKIP_KASAN flags don't match but the zones do, but in practice, won't
> > we have GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE in different zones? Or is
> > the problem that, even with a copy, it's already too late to restore
> > the flags because they been overwritten during kazan unpoisoning?
> arch_swap_restore is called just before shmem_replace_folio. It is a
> bit too late right now but I guess it is fixable.
>
> But shmem is not just a victim. It is also an offender to anon
> mappings. We would need a similar replacement logic in do_swap_page
> for anon mappings.
Come to think about it, that's not how things work.

Regular shmems and anon mappings set __GFP_SKIP_KASAN because they can
be mapped PROT_MTE. This calls page_kasan_tag_reset on the pages.

GPU shmems omit __GFP_SKIP_KASAN because they can't be mapped
PROT_MTE. This calls kasan_unpoison_pages on the pages.

With swap readahead, no one can expect the right function is called
anymore. The question is can we detect the mismatch and call
page_kasan_tag_reset/kasan_unpoison_pages to make things right again
in places such as do_swap_page and shmem_swapin_folio?

>
> >
> > [1]https://elixir.bootlin.com/linux/v7.0.9/source/mm/shmem.c#L2112