Re: [PATCH v3] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation
From: Eric Biggers
Date: Sun Mar 29 2026 - 18:19:07 EST
On Sun, Mar 29, 2026 at 10:57:04PM +0100, David Laight wrote:
> Final thought:
> Is that allowing for the cost of kernel_fpu_begin()? - which I think only
> affects the first call.
> And the cost of the data-cache misses for the lookup table reads? - again
> worse for the first call.
I assume you mean kernel_neon_begin(). This is an arm64 patch. (I
encourage you to actually read the code. You seem to send a lot of
speculation-heavy comments without actually reading the code.)
Currently, the benchmark in crc_kunit just measures the throughput in a
loop (as has been discussed before). So no, it doesn't currently
capture the overhead of pulling code and data into cache. For NEON
register use it captures only the amortized overhead.
Note that using PMULL saves having to pull the table into memory, while
using the table is a bit less code and saves having to use kernel-mode
NEON. So both have their advantages and disadvantages.
This patch does fall back to the table for the last 'len & ~15' bytes,
which means the table may be needed anyway. That is not the optimal way
to do it, and it's something to address later when this is replaced with
something similar to x86's crc-pclmul-template.S.
- Eric