Re: (sashiko review) [RFC PATCH v4 01/10] mm/damon/core: introduce damon_ctx->paused
From: SeongJae Park
Date: Sun Mar 22 2026 - 13:40:31 EST
On Sun, 22 Mar 2026 10:06:59 -0700 SeongJae Park <sj@xxxxxxxxxx> wrote:
> Forwarding Sashiko review for doing discussions via mails.
>
> # review url: https://sashiko.dev/#/patchset/20260322155728.81434-2-sj@xxxxxxxxxx
> # start of sashiko.dev inline review
> commit 2a3be2c8b6c327ef72163cd54e6121ef72b9c932
> Author: SeongJae Park <sj@xxxxxxxxxx>
>
> mm/damon/core: introduce damon_ctx->paused
>
> DAMON supports only start and stop of the execution. When it is
> stopped, its internal data that it self-trained goes away. It will be
> useful if the execution can be paused and resumed with the previous
> self-trained data.
>
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index db6c67e52d2b..0ab2cfa848e6 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
>
> [ ... ]
>
> > @@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
> > kdamond_call(ctx, false);
> > if (ctx->maybe_corrupted)
> > break;
> > + while (ctx->pause) {
> > + damos_walk_cancel(ctx);
>
> Is it safe to repeatedly call damos_walk_cancel() here?
>
> In damos_walk(), the walk_control_lock is dropped after setting
> ctx->walk_control to check damon_is_running(), which opens a race window.
>
> Concurrently, kdamond_fn() calls damos_walk_cancel(), which reads
> ctx->walk_control, drops the lock, and completes the old request.
>
> If a new caller invokes damos_walk() and successfully registers a new
> walk_control, could damos_walk_cancel() resume, reacquire the lock,
> and unconditionally set ctx->walk_control = NULL, dropping the new
> request?
>
> This might leave the new caller permanently deadlocked in
> wait_for_completion().
Nice catch. Orthogonal to this patch, though.
The deadlock could happen in below sequence.
CPU0 │CPU1
──────────────────────────────┼────────────────────────
damos_walk() │
│register request │
│wait completion │damos_walk_cancel()
│ │ │complete the request
▼wakeup,return │ │
damos_walk() │ │
│register new request │ │
│ │ │remove the new request
│wait completion │ ▼return
▼ nobody completes it. │
Nonetheless, kdamond_fn() is calling damos_walk() already in several places
including this loop. This issue hence exists regardless of this patch. I will
work on fixing this as a separate hotfix. Below fix may work.
'''
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -2321,7 +2321,9 @@ static void damos_walk_cancel(struct damon_ctx *ctx)
control->canceled = true;
complete(&control->completion);
mutex_lock(&ctx->walk_control_lock);
- ctx->walk_control = NULL;
+ /* A new damos_walk() caller could added a new request meanwhile */
+ if (ctx->walk_control == control)
+ ctx->walk_control = NULL;
mutex_unlock(&ctx->walk_control_lock);
}
'''
>
> > + kdamond_usleep(ctx->attrs.sample_interval);
>
> Could this result in a busy loop if sample_interval is 0?
>
> The DAMON API allows attrs.sample_interval to be 0 (for example, when
> tuning is disabled via goal.aggrs == 0).
>
> If sample_interval is 0, this invokes usleep_range_idle(0, 1), which
> sleeps for at most 1 microsecond using an hrtimer.
>
> Since the normally expensive monitoring operations are bypassed while
> paused, would this cause the thread to loop almost instantly and create
> an interrupt storm?
This is an intended behavior for users who really impatient. So this should be
fine.
Thanks,
SJ
[...]