Re: [PATCH v1] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum
From: Vladimir Murzin
Date: Fri Jun 05 2026 - 05:44:17 EST
On 6/5/26 00:12, Shanker Donthineni wrote:
> On systems with NVIDIA Olympus cores, a Device-nGnR* load can be
> observed by a peripheral before an older, non-overlapping Device-nGnR*
> store to the same peripheral. This breaks the program-order guarantee
> that software expects for Device-nGnR* accesses and can leave a
> peripheral in an incorrect state, as a load is observed before an
> earlier store takes effect.
>
> The erratum can occur only when all of the following apply:
>
> - A PE executes a Device-nGnR* store followed by a younger
> Device-nGnR* load.
> - The store is not a store-release.
> - The accesses target the same peripheral and do not overlap in bytes.
> - There is at most one intervening Device-nGnR* store in program
> order, and there are no intervening Device-nGnR* loads.
> - There is no DSB, and no DMB that orders loads, between the store and
> the load.
> - Specific micro-architectural and timing conditions occur.
>
> Two ways to restore ordering: insert a barrier (any DSB, or a DMB that
> orders loads) between the store and the load, or make the store a
> store-release. A load-acquire on the load side would not help, because
> acquire semantics do not prevent a load from being observed ahead of an
> older store; only the store side (release or a barrier) closes the
> window.
>
> Promote the raw MMIO store helpers (__raw_writeb/w/l/q) from plain str*
> to stlr* (Store-Release), which removes the "store is not a
> store-release" condition for every device write the kernel issues.
> Because writel() and writel_relaxed() are both built on __raw_writel()
> in asm-generic/io.h, patching the raw variants covers both the
> non-relaxed and relaxed APIs without touching the higher layers. Note
> that writel()'s own barrier sits before the store, so it does not order
> the store against a subsequent readl(); the store-release promotion is
> what provides that ordering.
>
> Like ARM64_ERRATUM_832075 on the load side, the change is gated on a new
> ARM64_WORKAROUND_DEVICE_STORE_RELEASE capability and only activated on
> parts that match MIDR_NVIDIA_OLYMPUS, so unaffected CPUs continue to use
> the plain str* sequence.
>
> Co-developed-by: Vikram Sethi <vsethi@xxxxxxxxxx>
> Signed-off-by: Vikram Sethi <vsethi@xxxxxxxxxx>
> Signed-off-by: Shanker Donthineni <sdonthineni@xxxxxxxxxx>
> ---
> Documentation/arch/arm64/silicon-errata.rst | 2 ++
> arch/arm64/Kconfig | 23 ++++++++++++++++++++
> arch/arm64/include/asm/io.h | 24 ++++++++++++++-------
> arch/arm64/kernel/cpu_errata.c | 8 +++++++
> arch/arm64/tools/cpucaps | 1 +
> 5 files changed, 50 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
> index 211119ce7adc..899bed3908bb 100644
> --- a/Documentation/arch/arm64/silicon-errata.rst
> +++ b/Documentation/arch/arm64/silicon-errata.rst
> @@ -256,6 +256,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | NVIDIA | Carmel Core | N/A | NVIDIA_CARMEL_CNP_ERRATUM |
> +----------------+-----------------+-----------------+-----------------------------+
> +| NVIDIA | Olympus core | T410-OLY-1027 | NVIDIA_OLYMPUS_1027_ERRATUM |
> ++----------------+-----------------+-----------------+-----------------------------+
> | NVIDIA | T241 GICv3/4.x | T241-FABRIC-4 | N/A |
> +----------------+-----------------+-----------------+-----------------------------+
> | NVIDIA | T241 MPAM | T241-MPAM-1 | N/A |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index fe60738e5943..a6bac84b05a1 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -564,6 +564,29 @@ config ARM64_ERRATUM_832075
>
> If unsure, say Y.
>
> +config NVIDIA_OLYMPUS_1027_ERRATUM
> + bool "NVIDIA Olympus: device store/load ordering erratum"
> + default y
> + help
> + This option adds an alternative code sequence to work around an
> + NVIDIA Olympus core erratum where a Device-nGnR* store can be
> + observed by a peripheral after a younger Device-nGnR* load to the
> + same peripheral. This breaks the program order that drivers rely
> + on for MMIO and can leave a device in an incorrect state.
> +
> + The workaround promotes the raw MMIO store helpers
> + (__raw_writeb/w/l/q) to Store-Release (STLR), which restores the
> + required ordering. Because writel() and writel_relaxed() are built
> + on __raw_writel(), both are covered without changes to the higher
> + layers.
> +
> + The fix is applied through the alternatives framework, so enabling
> + this option does not by itself activate the workaround: it is
> + patched in only when an affected CPU is detected, and is a no-op on
> + unaffected CPUs.
> +
> + If unsure, say Y.
> +
> config ARM64_ERRATUM_834220
> bool "Cortex-A57: 834220: Stage 2 translation fault might be incorrectly reported in presence of a Stage 1 fault (rare)"
> depends on KVM
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index 8cbd1e96fd50..b6d7966e9c19 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -25,29 +25,37 @@
> #define __raw_writeb __raw_writeb
> static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
> {
> - volatile u8 __iomem *ptr = addr;
> - asm volatile("strb %w0, %1" : : "rZ" (val), "Qo" (*ptr));
> + asm volatile(ALTERNATIVE("strb %w0, [%1]",
> + "stlrb %w0, [%1]",
> + ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
> + : : "rZ" (val), "r" (addr));
> }
>
Nitpick:
The change has the side effect of undoing d044d6ba6f02 ("arm64:
io: permit offset addressing"), since stlr* do not support
offset addressing. Unaffected CPUs would continue to use str*,
but would lose the benefit of offset addressing :(
Not sure if this needs to be mentioned in the commit message...
Cheers
Vladimir
> #define __raw_writew __raw_writew
> static __always_inline void __raw_writew(u16 val, volatile void __iomem *addr)
> {
> - volatile u16 __iomem *ptr = addr;
> - asm volatile("strh %w0, %1" : : "rZ" (val), "Qo" (*ptr));
> + asm volatile(ALTERNATIVE("strh %w0, [%1]",
> + "stlrh %w0, [%1]",
> + ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
> + : : "rZ" (val), "r" (addr));
> }
>
> #define __raw_writel __raw_writel
> static __always_inline void __raw_writel(u32 val, volatile void __iomem *addr)
> {
> - volatile u32 __iomem *ptr = addr;
> - asm volatile("str %w0, %1" : : "rZ" (val), "Qo" (*ptr));
> + asm volatile(ALTERNATIVE("str %w0, [%1]",
> + "stlr %w0, [%1]",
> + ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
> + : : "rZ" (val), "r" (addr));
> }
>
> #define __raw_writeq __raw_writeq
> static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
> {
> - volatile u64 __iomem *ptr = addr;
> - asm volatile("str %x0, %1" : : "rZ" (val), "Qo" (*ptr));
> + asm volatile(ALTERNATIVE("str %x0, [%1]",
> + "stlr %x0, [%1]",
> + ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
> + : : "rZ" (val), "r" (addr));
> }
>
> #define __raw_readb __raw_readb
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 5377e4c2eba2..958d7f16bfeb 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -809,6 +809,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
> ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_CARMEL),
> },
> #endif
> +#ifdef CONFIG_NVIDIA_OLYMPUS_1027_ERRATUM
> + {
> + /* NVIDIA Olympus core */
> + .desc = "NVIDIA Olympus device load/store ordering erratum",
> + .capability = ARM64_WORKAROUND_DEVICE_STORE_RELEASE,
> + ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_OLYMPUS),
> + },
> +#endif
> #ifdef CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE
> {
> /*
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 811c2479e82d..d367257bf770 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -120,6 +120,7 @@ WORKAROUND_CAVIUM_TX2_219_PRFM
> WORKAROUND_CAVIUM_TX2_219_TVM
> WORKAROUND_CLEAN_CACHE
> WORKAROUND_DEVICE_LOAD_ACQUIRE
> +WORKAROUND_DEVICE_STORE_RELEASE
> WORKAROUND_NVIDIA_CARMEL_CNP
> WORKAROUND_PMUV3_IMPDEF_TRAPS
> WORKAROUND_QCOM_FALKOR_E1003
> -- 2.43.0
>