Re: [PATCH v4 6/8] string: introduce memcpy_streaming() helpers

From: Borislav Petkov

Date: Sun Jun 07 2026 - 15:08:56 EST

On Wed, Jun 03, 2026 at 04:01:50PM +0800, Li Zhe wrote:
> Introduce a generic memcpy_streaming() interface for write-once copy
> sites that can fall back to memcpy() when no architecture-specific
> optimization is available, or when an architecture-specific backend
> cannot safely handle a given transfer.
>
> Add memcpy_streaming_drain() alongside it so callers can separate the
> copy primitive from any required ordering point. On x86, use
> memcpy_flushcache() and sfence only for aligned transfers that can stay
> entirely on the non-temporal store path; otherwise fall back to memcpy()

So you throwing "streaming", "non-temporal" and "flush-cache" wildly around
here and this is adding unnecessary confusion where it shouldn't. I'd suggest
you stick to "non-temporal" which you can abbreviate short'n'sweet to "nt" and
that's it. Keep it simple.

> so the generic API does not expose flushcache semantics on cached
> head/tail fragments.
>
> Callers are responsible for invoking memcpy_streaming_drain() before
> later normal stores that must be ordered after the streaming copy.
>
> Signed-off-by: Li Zhe <lizhe.67@xxxxxxxxxxxxx>
> ---
> arch/x86/include/asm/string_64.h | 32 ++++++++++++++++++++++++++++++++
> include/linux/string.h | 20 ++++++++++++++++++++
> 2 files changed, 52 insertions(+)
>
> diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
> index 4635616863f5..aee63108577f 100644
> --- a/arch/x86/include/asm/string_64.h
> +++ b/arch/x86/include/asm/string_64.h

There's arch/x86/include/asm/string.h. Why are those here, in the _64 variant?

> @@ -4,6 +4,7 @@
>
> #ifdef __KERNEL__
> #include <linux/jump_label.h>
> +#include <linux/align.h>
>
> /* Written 2002 by Andi Kleen */
>
> @@ -100,6 +101,37 @@ static __always_inline void memcpy_flushcache(void *dst, const void *src, size_t
> }
> __memcpy_flushcache(dst, src, cnt);
> }
> +
> +/*
> + * Only map memcpy_streaming() to memcpy_flushcache() when the destination
> + * is already 8-byte aligned and the size can be handled without cached
> + * head/tail fragments in __memcpy_flushcache().
> + */
> +static __always_inline bool memcpy_flushcache_nt_safe(const void *dst,
> + size_t cnt)

This is checking alignment. Then call it that.

> +{
> + unsigned long d = (unsigned long)dst;

Useless.

> +
> + return cnt && IS_ALIGNED(d, 8) && IS_ALIGNED(cnt, 4);
> +}

AFAICT, this helper is used only once. Zap it completely.

> +
> +#define __HAVE_ARCH_MEMCPY_STREAMING 1
> +static __always_inline void memcpy_streaming(void *dst, const void *src,

memcpy_nt()

> + size_t cnt)
> +{
> + if (!cnt)
> + return;
> +
> + if (memcpy_flushcache_nt_safe(dst, cnt))

That branch can cost. Why is that alignment checking so necessary? Why can't
you simply DTRT by handling the misaligned parts like __memcpy_flushcache().

What does this bring you? None of that is explained in the commit message so
why do I want this patch at all?

The commit message is basically telling me what the patch does but I can kinda
read that from the diff itself. What it is not telling me is *why* it exists.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette