Re: [PATCH/RFC] PM: domains: Call pm_runtime_barrier() before dev_pm_domain_{attach*,detach}()
From: Ulf Hansson
Date: Thu Mar 19 2026 - 07:01:23 EST
On Thu, 12 Mar 2026 at 10:54, Geert Uytterhoeven
<geert+renesas@xxxxxxxxx> wrote:
>
> If a device has multiple PM Domains, dev_pm_domain_detach() is called
> multiple times on unbind or probe failure. If the PM Domain is also a
> Clock Domain, and thus calls pm_clk_destroy() from its .detach()
> callback, dev_pm_put_subsys_data() will set dev->power.subsys_data to
> NULL when psd->refcount reaches zero.
>
> Later/in parallel, default_suspend_ok() calls dev_gpd_data():
>
> static inline struct generic_pm_domain_data *dev_gpd_data(struct device *dev)
> {
> return to_gpd_data(dev->power.subsys_data->domain_data);
> }
>
> which may trigger a NULL pointer dereference.
>
> All dev_pm_domain_{at,de}tach*() functions document that callers must
> ensure proper synchronization of these functions with power management
> callbacks. Unfortunately no callers seem to actually do so. This
> includes dev_pm_domain_attach_list() and dev_pm_domain_detach_list():
> they call dev_pm_domain_{attach*,detach}() internally, which means they
> should take care of this synchronization themselves.
>
> Add synchronization to dev_pm_domain_{at,de}tach_list() by calling
> pm_runtime_barrier() before dev_pm_domain_{attach*,detach}(), and drop
> the now obsolete comments.
My apologies for not being able to respond earlier to your
suggestions/questions. I have started looking into this now, and I
will follow up with more replies and perhaps a patch shortly.
Anyway, the principle is that callers of dev_pm_domain_detach() must
manage the runtime PM enabling/disabling for its device. If runtime PM
was enabled, it must typically be disabled before calling
dev_pm_domain_detach().
What makes this a bit more complicated is that we have two different
scenarious to consider.
1) The legacy case, attachment via dev_pm_domain_attach() for the
single PM domain case. Runtime PM should be enabled/disabled for the
device, from its corresponding driver/bus. I assume this isn't the
problem you are facing, right?
2) Attachment via dev_pm_domain_attach_by_id|name() (which is called
for the *attach_list() case too), for the single/multi PM domain
cases. In these cases, runtime PM is enabled in
genpd_dev_pm_attach_by_id().
For 2), I am inclined to think that the proper action is to call
pm_runtime_disable() in genpd_dev_pm_detach() before it calls
genpd_remove_device(). Although, I need to check more closely how
suitable that would be.
Kind regards
Uffe
>
> Suggested-by: Marek Vasut <marek.vasut@xxxxxxxxxxx>
> Signed-off-by: Geert Uytterhoeven <geert+renesas@xxxxxxxxx>
> ---
> This issue was reported first in "drm/imagination:
> genpd_runtime_suspend() crash"[1] and "Re: [PATCH 2/5] arm64: dts:
> renesas: r8a77960-salvator-x: Enable GPU support"[2].
> Unfortunately this patch does not fix the issue for good, it just
> becomes much harder to trigger (like needing tens of thousands of
> tries).
>
> How to trigger:
>
> 1. Check out drm-next[3]
>
> 2. Enable the gpu node in one of the following DTS files, depending on
> your board (Salvator-X(S), ULCB, or Falcon):
>
> arch/arm64/boot/dts/renesas/r8a77960.dtsi
> arch/arm64/boot/dts/renesas/r8a77961.dtsi
> arch/arm64/boot/dts/renesas/r8a77965.dtsi
> arch/arm64/boot/dts/renesas/r8a779a0.dtsi
>
> These nodes are not yet enabled in any board DTS because of this
> crash.
>
> 3. Build and boot a kernel using renesas_defconfig[4]
>
> 4. The PowerVR driver will fail to probe (since [5], which is IMHO a
> regression):
>
> powervr fd000000.gpu: [drm] *ERROR* Unknown GPU! Set 'exp_hw_support' to bypass this check.
>
> 5. Try to bind the driver again:
>
> $ for i in $(seq 1000000); do echo $i; echo fd000000.gpu > /sys/bus/platform/drivers/powervr/bind; done
>
> Eventually, the kernel will crash:
>
> [...]
> powervr fd000000.gpu: [drm] *ERROR* Unknown GPU! Set 'exp_hw_support' to bypass this check.
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000040
> Mem abort info:
> ESR = 0x0000000096000004
> EC = 0x25: DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> FSC = 0x04: level 0 translation fault
> Data abort info:
> ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> user pgtable: 4k pages, 48-bit VAs, pgdp=0000000049993000
> [0000000000000040] pgd=0000000000000000, p4d=0000000000000000
> Internal error: Oops: 0000000096000004 [#1] SMP
> CPU: 1 UID: 0 PID: 12 Comm: kworker/u8:0 Not tainted 7.0.0-rc2-arm64-renesas-00540-g5f0a63f81a02-dirty #3502 PREEMPT
> Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
> Workqueue: pm pm_runtime_work
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : genpd_runtime_suspend+0x134/0x28c
> lr : genpd_runtime_suspend+0x124/0x28c
> sp : ffff80008174bc50
> x29: ffff80008174bc50 x28: 0000000000000000 x27: 0000000000000000
> x26: 0000003ca1f7104b x25: ffff0000090ba580 x24: ffff00000e7d92a0
> x23: ffff0000081612f8 x22: 0000000000000001 x21: ffff000008161000
> x20: 0000000000000000 x19: ffff00000b6ef400 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000 x15: ffff000008065600
> x14: 0000000000000058 x13: ffff0000080254e0 x12: 0000000000000000
> x11: ffff000008065608 x10: 00000000001343d0 x9 : ffff0000080656c0
> x8 : ffff000008161800 x7 : 000001f3fffffc18 x6 : 0000000000000000
> x5 : ffff000008161c10 x4 : 0000000000000000 x3 : 0000000000000000
> x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
> Call trace:
> genpd_runtime_suspend+0x134/0x28c (P)
> __rpm_callback+0x44/0x1cc
> rpm_callback+0x6c/0x78
> rpm_suspend+0x108/0x564
> pm_runtime_work+0xb8/0xbc
> process_one_work+0x144/0x280
> worker_thread+0x180/0x2f8
> kthread+0x114/0x120
> ret_from_fork+0x10/0x20
> Code: d503201f f940fe60 52800002 f9410e61 (f9402003)
> ---[ end trace 0000000000000000 ]---
>
> The issue is easier to trigger, and may prevent the kernel from booting
> at all, by adding extra debug prints like:
>
> diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> index 52ea84e548ff6d27..2fe666c2170194ab 100644
> --- a/drivers/pmdomain/core.c
> +++ b/drivers/pmdomain/core.c
> @@ -256,12 +256,14 @@ struct device *dev_to_genpd_dev(struct device *dev)
> static int genpd_stop_dev(const struct generic_pm_domain *genpd,
> struct device *dev)
> {
> +pr_info("==== %s/%s: stop\n", genpd->name, dev_name(dev));
> return GENPD_DEV_CALLBACK(genpd, int, stop, dev);
> }
>
> static int genpd_start_dev(const struct generic_pm_domain *genpd,
> struct device *dev)
> {
> +pr_info("==== %s/%s: start\n", genpd->name, dev_name(dev));
> return GENPD_DEV_CALLBACK(genpd, int, start, dev);
> }
>
> Thanks for your comments and suggestions!
>
> [1] https://lore.kernel.org/CAMuHMdWapT40hV3c+CSBqFOW05aWcV1a6v_NiJYgoYi0i9_PDQ@xxxxxxxxxxxxxx
> [2] https://lore.kernel.org/CAMuHMdWyKeQq31GEK+-y4BoaZFcCxJNac63S7NoocMj1cYKniw@xxxxxxxxxxxxxx/
> [3] commit 5f0a63f81a027bec ("Merge tag 'drm-misc-next-2026-03-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next")
> [4] https://web.git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel.git/tree/arch/arm64/configs/renesas_defconfig?h=topic/renesas-defconfig
> [5] commit 1c21f240fbc1e47b ("drm/imagination: Warn or error on unsupported hardware") in v7.0-rc1
> ---
> drivers/base/power/common.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/base/power/common.c b/drivers/base/power/common.c
> index 9bef9248a70529bf..af690ce38ac3a086 100644
> --- a/drivers/base/power/common.c
> +++ b/drivers/base/power/common.c
> @@ -12,6 +12,7 @@
> #include <linux/acpi.h>
> #include <linux/pm_domain.h>
> #include <linux/pm_opp.h>
> +#include <linux/pm_runtime.h>
>
> #include "power.h"
>
> @@ -183,9 +184,6 @@ EXPORT_SYMBOL_GPL(dev_pm_domain_attach_by_name);
> * may also provide an empty list, in case the attach should be done for all of
> * the available PM domains.
> *
> - * Callers must ensure proper synchronization of this function with power
> - * management callbacks.
> - *
> * Returns the number of attached PM domains or a negative error code in case of
> * a failure. Note that, to detach the list of PM domains, the driver shall call
> * dev_pm_domain_detach_list(), typically during the remove phase.
> @@ -240,6 +238,7 @@ int dev_pm_domain_attach_list(struct device *dev,
> link_flags |= DL_FLAG_RPM_ACTIVE;
>
> for (i = 0; i < num_pds; i++) {
> + pm_runtime_barrier(dev);
> if (by_id)
> pd_dev = dev_pm_domain_attach_by_id(dev, i);
> else
> @@ -284,12 +283,14 @@ int dev_pm_domain_attach_list(struct device *dev,
>
> err_link:
> dev_pm_opp_clear_config(pds->opp_tokens[i]);
> + pm_runtime_barrier(pd_dev);
> dev_pm_domain_detach(pd_dev, true);
> err_attach:
> while (--i >= 0) {
> dev_pm_opp_clear_config(pds->opp_tokens[i]);
> if (pds->pd_links[i])
> device_link_del(pds->pd_links[i]);
> + pm_runtime_barrier(pds->pd_devs[i]);
> dev_pm_domain_detach(pds->pd_devs[i], true);
> }
> kfree(pds->pd_devs);
> @@ -370,9 +371,6 @@ EXPORT_SYMBOL_GPL(dev_pm_domain_detach);
> *
> * This function reverse the actions from dev_pm_domain_attach_list().
> * Typically it should be invoked during the remove phase from drivers.
> - *
> - * Callers must ensure proper synchronization of this function with power
> - * management callbacks.
> */
> void dev_pm_domain_detach_list(struct dev_pm_domain_list *list)
> {
> @@ -385,6 +383,7 @@ void dev_pm_domain_detach_list(struct dev_pm_domain_list *list)
> dev_pm_opp_clear_config(list->opp_tokens[i]);
> if (list->pd_links[i])
> device_link_del(list->pd_links[i]);
> + pm_runtime_barrier(list->pd_devs[i]);
> dev_pm_domain_detach(list->pd_devs[i], true);
> }
>
> --
> 2.43.0
>