Re: [PATCH] thermal: core: fix use-after-free due to init/cancel delayed_work race

From: Rafael J. Wysocki

Date: Wed Mar 25 2026 - 15:31:51 EST


On Wed, Mar 25, 2026 at 8:22 PM Mauricio Faria de Oliveira
<mfo@xxxxxxxxxx> wrote:
>
> On 2026-03-25 13:24, Rafael J. Wysocki wrote:
> > On Wed, Mar 25, 2026 at 4:13 PM Mauricio Faria de Oliveira
> > <mfo@xxxxxxxxxx> wrote:
> >>
> >> On 2026-03-25 11:28, Mauricio Faria de Oliveira wrote:
> >> > On 2026-03-25 11:17, Mauricio Faria de Oliveira wrote:
> >> >> Thanks for looking into this.
> >> >>
> >> >> On 2026-03-25 09:47, Rafael J. Wysocki wrote:
> >> >>> I can see the one between thermal_zone_device_unregister() and
> >> >>> thermal_zone_device_resume(), but that can be addressed by adding a
> >> >>> TZ_STATE_FLAG_EXIT check to the latter AFAICS.
> >> >>
> >> >
> >> > Please disregard this paragraph; I incorrectly read/wrote _resume()
> >> > as thermal_zone_pm_complete() discussed above. The rest should be
> >> > right. I'll review this and get back shortly.
> >> >
> >> >> In the example describe above and detailed below, apparently that
> >> >> is not sufficient, if I'm not missing anything. See, if _resume()
> >> >> is reached with thermal_list_lock held, thermal_zone_device_exit()
> >> >> is waiting for thermal_list_lock before setting TZ_STATE_FLAG_EXIT,
> >> >> thus a check for it in _resume() would find it clear yet.
> >>
> >> Ok, similarly:
> >>
> >> Say, thermal_pm_notify() -> thermal_pm_notify_complete() ->
> >> thermal_zone_pm_complete()
> >> run before thermal_zone_device_unregister() is called;
> >> thermal_zone_device_resume()
> >> starts, and by now thermal_zone_device_unregister() is called.
> >>
> >> If thermal_zone_device_resume() wins the race over thermal_zone_exit()
> >> for guard(thermal_zone(tz) (tz->lock), it sees TZ_STATE_FLAG_EXIT clear;
> >> note its callees (eg, thermal_zone_device_init()) run with tz->lock
> >> held,
> >> so they see it clear as well.
> >>
> >> So, thermal_zone_device_init() calls INIT_DELAYED_WORK(), everything
> >> returns, tz->lock is released and the thermal_zone_device_unregister()
> >> -> thermal_zone_exit() path can continue to run.
> >>
> >> Only now thermal_zone_exit() sets TZ_STATE_FLAG_EXIT (too late),
> >> returns.
> >> cancel_delayed_work_sync() does not wait for
> >> thermal_zone_device_resume()
> >> due to INIT_DELAYED_WORK() in thermal_zone_device_init(); and kfree(tz).
> >>
> >> Then, thermal_zone_device_resume() accesses tz and hits use-after-free.
> >>
> >> Hope this clarifies. Please let me know your thoughts. Thanks!
> >
> > Thanks for the analysis, it sounds accurate.
> >
> > I'd say that thermal_zone_device_unregister() needs to flush the
> > workqueue before calling cancel_delayed_work_sync() to get rid of the
> > stuff that may be running out of it that hasn't seen the changes made
> > by thermal_zone_exit().
>
> IIUIC, cancel_delayed_work_sync() has that effect: it waits for
> (specific)
> work that might be running and hasn't seen changes by
> thermal_zone_exit()).

Sure, but you argued yourself that this didn't work if the work item
in question had been reinitialized in the meantime.

Flushing of the workqueue would take care of that.

> > This should take care of all of the existing races because if anything
> > is running out of the workqueue when thermal_zone_device_unregister()
> > runs, it will be waited for after calling thermal_zone_exit() and any
> > leftover stuff will be caught by cancel_delayed_work_sync().
>
> Likewise, the wait-for part is an effect of cancel_delayed_work_sync(),
> and AFAIK, there is no leftover after cancel_delayed_work_sync(), as
> it waits for the running work function to finish.
>
> And no further work is queued in the 2 code paths that can queue work:
>
> 1) thermal_zone_device_check(): even if it misses the tz->state check,
> mod_delayed_work() does not requeue the current work item if it is
> canceled/waited for by cancel_delayed_work_sync() (tested locally).
>
> 2) thermal_zone_pm_complete(): this function will no longer be reached
> because tz is no longer in thermal_tz_list.
>
> > Of course, it's better to switch over to using a dedicated workqueue
> > in the thermal core for that.
>
> Considering the points above, AFAICT, it should be sufficient to call
> cancel_delayed_work_sync() for the 2 code paths in unregister()
> (which thus require the distint work items for each code path).

And I don't want to add another work item to the thermal zone
structure just for the handling of suspend/resume.