Re: [PATCH v3] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation

Next message: Jakub Kicinski: "Re: [net-next v6 08/12] net: bnxt: Implement software USO"
Previous message: Jori Koolstra: "Re: [RFC PATCH v2 1/3] vfs: add support for empty path to openat2(2)"
In reply to: David Laight: "Re: [PATCH v3] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation"
Next in thread: David Laight: "Re: [PATCH v3] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Eric Biggers

Date: Sun Mar 29 2026 - 18:19:07 EST

On Sun, Mar 29, 2026 at 10:57:04PM +0100, David Laight wrote:
> Final thought:
> Is that allowing for the cost of kernel_fpu_begin()? - which I think only
> affects the first call.
> And the cost of the data-cache misses for the lookup table reads? - again
> worse for the first call.

I assume you mean kernel_neon_begin(). This is an arm64 patch. (I
encourage you to actually read the code. You seem to send a lot of
speculation-heavy comments without actually reading the code.)

Currently, the benchmark in crc_kunit just measures the throughput in a
loop (as has been discussed before). So no, it doesn't currently
capture the overhead of pulling code and data into cache. For NEON
register use it captures only the amortized overhead.

Note that using PMULL saves having to pull the table into memory, while
using the table is a bit less code and saves having to use kernel-mode
NEON. So both have their advantages and disadvantages.

This patch does fall back to the table for the last 'len & ~15' bytes,
which means the table may be needed anyway. That is not the optimal way
to do it, and it's something to address later when this is replaced with
something similar to x86's crc-pclmul-template.S.

- Eric