Re: [PATCH v3] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation

Next message: Marc Dietrich: "Re: [PATCH] staging: nvec: validate battery response length before memcpy"
Previous message: Mario Limonciello (AMD): "[PATCH v6 4/5] cpufreq/amd-pstate: Add support for raw EPP writes"
In reply to: Demian Shulhan: "[PATCH v3] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation"
Next in thread: David Laight: "Re: [PATCH v3] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Eric Biggers

Date: Sun Mar 29 2026 - 16:44:51 EST

On Sun, Mar 29, 2026 at 07:43:38AM +0000, Demian Shulhan wrote:
> Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON
> Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR
> software implementation is slow, which creates a bottleneck in NVMe and
> other storage subsystems.
>
> The acceleration is implemented using C intrinsics (<arm_neon.h>) rather
> than raw assembly for better readability and maintainability.
>
> Key highlights of this implementation:
> - Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency
> spikes on large buffers.
> - Pre-calculates and loads fold constants via vld1q_u64() to minimize
> register spilling.
> - Benchmarks show the break-even point against the generic implementation
> is around 128 bytes. The PMULL path is enabled only for len >= 128.
>
> Performance results (kunit crc_benchmark on Cortex-A72):
> - Generic (len=4096): ~268 MB/s
> - PMULL (len=4096): ~1556 MB/s (nearly 6x improvement)
>
> Signed-off-by: Demian Shulhan <demyansh@xxxxxxxxx>

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=crc-next

Thanks!

- Eric