Re: [RFC PATCH v2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock
From: SeongJae Park
Date: Thu Mar 26 2026 - 23:53:01 EST
On Thu, 26 Mar 2026 17:49:51 -0700 SeongJae Park <sj@xxxxxxxxxx> wrote:
> When kdamond_fn() main loop is finished, the function cancels all
> remaining damon_call() requests and unset the damon_ctx->kdamond so that
> API callers can show the context is terminated. damon_call() adds the
> caller's request to the queue first. After that, it shows if the
> kdamond of the damon_ctx is still running (damon_ctx->kdamond is set).
> Only if the kdamond is running, damon_call() starts waiting for the
> kdamond's handling of the newly added request.
>
> The damon_call() requests registration and damon_ctx->kdamond unset are
> protected by different mutexes, though. Hence, damon_call() could race
> with damon_ctx->kdamond unset, and result in deadlocks.
>
> For example, let's suppose kdamond successfully finished the
> damon_call() requests cancelling. Right after that, damon_call() is
> called for the context. It registers the new request, and shows the
> context is still running, because damon_ctx->kdamond unset is not yet
> done. Hence the damon_call() caller starts waiting for the handling of
> the request. However, the kdamond is already on the termination steps,
> so it never handles the new request. As a result, the damon_call()
> caller threads infinitely waits.
>
> Fix this by introducing another damon_ctx field, namely
> call_controls_obsolete. It is protected by the
> damon_ctx->call_controls_lock, which protects damon_call() registration.
> Initialize (unset) it in kdamond_init_ctx()
In this veersion, I updated the initialization to be done in kdamond_fn()
before the damon_started completion. But I forgot updating the above sentence.
I will make the update in the next version.
Thanks,
SJ
[...]