Re: [PATCH] arm64: suspend: Remove forcing error from suspend finisher

From: Sudeep Holla

Date: Thu Mar 19 2026 - 13:02:00 EST


On Thu, Mar 19, 2026 at 03:14:01PM +0000, Sudeep Holla wrote:
> On Mon, Mar 16, 2026 at 02:18:18PM +0530, Maulik Shah wrote:
> > Successful cpu_suspend() may not always want to return to cpu_resume() to
> > save the work and latency involved.
> >
> > consider a scenario,
> >
> > when single physical CPU (pCPU) is used on different virtual machines (VMs)
> > as virtual CPUs (vCPUs). VM-x's vCPU can request a powerdown state after
> > saving the context by invoking __cpu_suspend_enter() whereas VM-y's vCPU is
> > requesting a shallower than powerdown state. The hypervisor aggregates to a
> > non powerdown state for pCPU. A wakeup event for VM-x's vCPU may want to
> > resume the execution at the same place instead of jumping to cpu_resume()
> > as the HW never reached till powerdown state which would have lost the
> > context.
> >
>
> Though I don't fully understand the intention/use-case for presenting the
> VMs with powerdown states ....
>
> > While the vCPU of VM-x had latency impact of saving the context in suspend
> > entry path but having the return to same place saves the latency to restore
> > the context in resume path.
> >
>
> I understand the exit-latency aspect, though the register set involved is not
> very large unless the driver notifier list is sizable on some platforms. This
> is typically the case in Platform Coordinated mode.
>
> > consider another scenario,
> >
> > Newer CPUs include a feature called “powerdown abandon”. The feature is
> > based on the observation that events like GIC wakeups have a high
> > likelihood of happening while the CPU is in the middle of its powerdown
> > sequence (at wfi). Older CPUs will powerdown and immediately power back
> > up when this happens. The newer CPUs will “give up” mid way through if
> > no context has been lost yet. This is possible as the powerdown operation
> > is lengthy and a large part of it does not lose context [1].
> >
>
> When you say "large part" above, do you mean that none of the CPU context, as
> visible to software, is lost? Otherwise, we would need to discuss that "large
> part" in more detail. From the kernel point of view, this is a simple boolean:
> context is either lost or retained. Anything in between is not valid, as we do
> not support partial context loss.
>
> > As the wakeup arrived after SW powerdown is done but before HW is fully
> > powered down. From SW view this is still a successful entry to suspend
> > and since the HW did not loose the context there is no reason to return at
> > entry address cpu_resume() to restore the context.
> >
>
> Yes, that may be worth considering from an optimization perspective. However,
> if the hardware aborts the transition, then returning success regardless of the
> software state should still be counted as a failure. That would keep the
> cpuidle entry statistics more accurate than returning success. And it is
> a failure as the OS expected to enter that powerdown state but there was
> as H/W abort.
>
> > Remove forcing the failure at kernel if the execution does not resume at
> > cpu_resume() as kernel has no reason to treat such returns as failures
> > when the firmware has already filled in return as success.
> >
>
> This is not possible with the current PSCI spec:
> "Powerdown states do not return on success because restart is through the
> entry point address at wakeup."
>

OK, my bad. Sorry for that.
For some reason, I read "do not return" as "must not return".

The spec allows this:
| The caller must not assume that a powerdown request will return using the
| specified entry point address. The powerdown request might not complete due,
| for example, to pending interrupts. It is also possible that, because of
| coordination with other cores, the actual state entered is shallower
| than the one requested. Because of this it is possible for an
| implementation to downgrade the powerdown state request to a standby
| state. In the case of a downgrade to standby, the implementation
| returns at the instruction following the PSCI call, at the Exception
| level of the caller, instead of returning by the specified entry point
| address. The return code in this case is SUCCESS. In the case of an
| early return due to a pending wakeup event, the implementation can
| return at the next instruction, with a return code of SUCCESS, or
| resume at the specified entry point address

So we need to dig and check if there was any reason for returning "NOT
SUPPORTED" when the call returned success.

--
Regards,
Sudeep