Re: [PATCH v1] soundwire: intel_auxdevice: Fix system suspend/resume handling

From: Pierre-Louis Bossart
Date: Mon May 05 2025 - 06:17:07 EST




On 4/25/25 13:43, Rafael J. Wysocki wrote:
On Fri, Apr 25, 2025 at 8:10 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:

On Fri, Apr 25, 2025 at 7:14 PM Pierre-Louis Bossart
<pierre-louis.bossart@xxxxxxxxx> wrote:

On 4/24/25 20:13, Rafael J. Wysocki wrote:
From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>

The code in intel_suspend() and intel_resume() needs to be properly
synchronized with runtime PM which is not the case currently, so fix
it.

First of all, prevent runtime PM from triggering after intel_suspend()
has started because the changes made by it to the device might be
undone by a runtime resume of the device. For this purpose, add a
pm_runtime_disable() call to intel_suspend().

Allow me to push back on this, because we have to be very careful with a hidden state transition that needs to happen.

If a controller was suspended by pm_runtime, it will enter the clock stop mode.

If the system needs to suspend, the controller has to be forced to exit the clock stop mode and the bus has to restart before we can suspend it, and that's why we had those pm_runtime_resume().

Disabling pm_runtime when entering system suspend would be problematic for Intel hardware, it's a known can of worms.

No, it wouldn't AFAICS.

I was referring to the SoundWire controller. The states are different between pm_runtime suspend (clock is stopped but external wakes are supported) and system suspend (external wakes are not supported).

If the system suspend is entered while the device is already in pm_runtime suspend, then we have to perform a full resume before the system suspend.

I am not going to argue on how to perform this resume, just that it's required. The direction transition from pm_runtime suspend to system suspend is not supported.

It's quite possible that some of the code in intel_suspend() is no longer required because the .prepare will resume the bus properly, but I wanted to make sure this state transition out of clock-stop is known and taken into consideration.

This patch doesn't change the functionality in intel_suspend(), it
just prevents runtime resume running in parallel with it or after it
from messing up with the hardware.

I don't see why it would be unsafe to do and please feel free to prove me wrong.

Or just tell me what I'm missing in the reasoning below.

This code:

- if (pm_runtime_suspended(dev)) {
- dev_dbg(dev, "pm_runtime status was suspended, forcing active\n");
-
- /* follow required sequence from runtime_pm.rst */
- pm_runtime_disable(dev);
- pm_runtime_set_active(dev);
- pm_runtime_mark_last_busy(dev);
- pm_runtime_enable(dev);
-
- pm_runtime_resume(bus->dev);
-
- link_flags = md_flags >> (bus->link_id * 8);
-
- if (!(link_flags & SDW_INTEL_MASTER_DISABLE_PM_RUNTIME_IDLE))
- pm_runtime_idle(dev);
- }

that is being removed by my patch (because it is invalid - more about
that later) had never run before commit bca84a7b93fd ("PM: sleep: Use
DPM_FLAG_SMART_SUSPEND conditionally") because setting
DPM_FLAG_SMART_SUSPEND had caused the core to call
pm_runtime_set_active() on the device in the noirq resume phase, so it
had never been seen as runtime-suspended in intel_resume(). After
commit bca84a7b93fd the core doesn't do that any more, so if the
device has been runtime-suspended before intel_suspend() runs,
intel_resume() will see that its status is RPM_SUSPENDED. The code in
question will run and it will crash and burn if
SDW_INTEL_MASTER_DISABLE_PM_RUNTIME_IDLE is set in the link flags.

The reason why that code is invalid is because the
pm_runtime_set_active() call in it causes the status to change to
RPM_ACTIVE, but it doesn't actually change the state of the device
(that is still physically suspended). The subsequent
pm_runtime_resume() sees that the status is RPM_ACTIVE and it doesn't
do anything. At this point, the device is still physically suspended,
but its runtime PM status is RPM_ACTIVE, so if pm_runtime_idle() runs,
it will trigger an attempt to suspend and that will break because the
device is already suspended.

So this code had never run before and it demonstrably doesn't work, so
I don't see why removing it could be incorrect.

I don't have enough knowledge to counter your arguments :-). I think we misread the documentation in runtime_pm.pst, this sort of sequence is mentioned but on system resume, and we applied it for system suspend as well.