Re: [PATCH] thermal: core: fix use-after-free due to init/cancel delayed_work race

From: Rafael J. Wysocki

Date: Wed Mar 25 2026 - 08:56:01 EST


On Wed, Mar 25, 2026 at 1:10 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Wed, Mar 25, 2026 at 12:51 AM Mauricio Faria de Oliveira
> <mfo@xxxxxxxxxx> wrote:
> >
> > If INIT_DELAYED_WORK() is called for a currently running work item,
> > cancel_delayed_work_sync() is unable to cancel/wait for it anymore,
> > as the work item's data bits required for that are cleared.
> >
> > In the resume path, INIT_DELAYED_WORK() is called twice:
> > 1) to replace the work function: thermal_zone_device_check/resume()
> > 2) to restore it.
> >
> > Both cases might race with the unregister path and bypass the call to
> > cancel_delayed_work_sync(),
>
> So this is the problem, isn't it?
>
> > after which struct thermal_zone_device *tz
> > is freed, and the non-canceled/non-waited for work hits use-after-free.
>
> Which basically means that a TZ_STATE_FLAG_EXIT check is missing in
> both thermal_zone_pm_complete() and thermal_zone_device_resume().

Actually, thermal_zone_pm_complete() runs under thermal_list_lock and
thermal_zone_device_unregister() removes the zone from
thermal_tz_list, also under thermal_list_lock, before calling
cancel_delayed_work_sync().

So either thermal_zone_device_unregister() removes the zone from the
list before thermal_zone_pm_complete() can run, in which case the
latter won't run for the given zone at all because that zone is not
there in thermal_tz_list, or the cancel_delayed_work_sync() will see
the work item queued up by thermal_zone_pm_complete().

So where's the race between thermal_zone_pm_complete() and
thermal_zone_device_unregister()?

I can see the one between thermal_zone_device_unregister() and
thermal_zone_device_resume(), but that can be addressed by adding a
TZ_STATE_FLAG_EXIT check to the latter AFAICS.