Re: [PATCH] spi-nor: Verify written data in paranoid mode

From: Csókás Bence
Date: Wed Apr 16 2025 - 10:46:28 EST

Next message: Vadim Fedorenko: "Re: [PATCH v1] ptp: ocp: fix NULL deref in _signal_summary_show"
Previous message: David Wang: "nvme nvme0: Failed to get ANA log after suspend/resume"
In reply to: Richard Weinberger: "Re: [PATCH] spi-nor: Verify written data in paranoid mode"
Next in thread: Richard Weinberger: "Re: [PATCH] spi-nor: Verify written data in paranoid mode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

On 2025. 04. 16. 15:09, Richard Weinberger wrote:

----- Ursprüngliche Mail -----

Von: "Csókás Bence" <csokas.bence@xxxxxxxxx>

Add MTD_SPI_NOR_PARANOID config option for verifying all written data to
prevent silent bit errors to be undetected, at the cost of halving SPI
bandwidth.

What is the use case for this? Why is it specific to SPI-NOR
flashes? Or should it rather be an MTD "feature". I'm not sure
whether this is the right way to do it, thus I'd love to hear more
about the background story to this.

Well, our case is quite specific, but we wanted to provide a general
solution for upstream. In our case we have a component in the data path
that can cause a burst bit error, on average after about a hundred
megabytes written.

Hmm. So, there is a serve hardware issue you're working around.

We _could_ make it MTD-wide, in our case we only have a NOR Flash
onboard so this is where we added it. If it were in the MTD core, where
would it make sense?

I'm not so sure whether it makes sense at all.
In it's current form, there is no recovery. So anything non-trivial
on top of the MTD will just see an -EIO and has to give up.
E.g. a filesystem will remount read-only.

In our case, we use UBIFS on top of UBI, which in this case chooses another eraseblock to hold the data instead, then re-tests (erase+write cycles) the one which gave -EIO. Since the bus error is only transient, it goes away by this time, and thus UBIFS will recover from this cleanly.

So yes, it is up to the FS/upper layers to handle the error. If it can't recover from this, then yes, it will give up and enter some 'safe mode' (e.g. remount ro). But at least it *does* get notified that there is something up, and has a chance to react. Before it just thought everything was written with no errors, and then there would be data corruption *on the next read*.

Thanks,
//richard

Bence

Next message: Vadim Fedorenko: "Re: [PATCH v1] ptp: ocp: fix NULL deref in _signal_summary_show"
Previous message: David Wang: "nvme nvme0: Failed to get ANA log after suspend/resume"
In reply to: Richard Weinberger: "Re: [PATCH] spi-nor: Verify written data in paranoid mode"
Next in thread: Richard Weinberger: "Re: [PATCH] spi-nor: Verify written data in paranoid mode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]