Re: Regression on linux-next (next-20260324 )
From: Peter Zijlstra
Date: Fri Mar 27 2026 - 12:44:04 EST
On Fri, Mar 27, 2026 at 05:31:00PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 27, 2026 at 07:09:26PM +0530, Borah, Chaitanya Kumar wrote:
> > Hello Matthew,
> >
> > Hope you are doing well. I am Chaitanya from the linux graphics team in
> > Intel.
> >
> > This mail is regarding a regression we are seeing in our CI runs[1] on
> > linux-next repository.
> >
> > Since the version next-20260324 [2], we are seeing the following regression
> >
> > `````````````````````````````````````````````````````````````````````````````````
> > <5>[ 157.361977] [IGT] Inactivity timeout exceeded. Killing the current
> > test with SIGQUIT.
> > <6>[ 157.362097] sysrq: HELP : loglevel(0-9) reboot(b) crash(c)
> > show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f)
> > kill-all-tasks(i) thaw-filesystems(j) sak(k)
> > show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n)
> > poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s)
> > show-task-states(t) unmount(u) force-fb(v) show-blocked-tasks(w)
> > dump-ftrace-buffer(z) replay-kernel-logs(R)
> > <6>[ 157.399543] sysrq: Show State
> > <6>[ 157.403061] task:systemd state:S stack:0 pid:1 tgid:1
> > ppid:0 task_flags:0x400100 flags:0x00080000
> > <6>[ 157.403067] Call Trace:
> > <6>[ 157.403069] <TASK>
> > <6>[ 157.403072] __schedule+0x5d7/0x1ef0
> > <6>[ 157.403078] ? lock_acquire+0xc4/0x300
> > <6>[ 157.403084] ? schedule+0x10e/0x180
> > <6>[ 157.403087] ? lock_release+0xcd/0x2b0
> > <6>[ 157.403092] schedule+0x3a/0x180
> > <6>[ 157.403094] schedule_hrtimeout_range_clock+0x112/0x120
> > <6>[ 157.403097] ? do_epoll_wait+0x3e4/0x5b0
> > <6>[ 157.403102] ? lock_release+0xcd/0x2b0
> > <6>[ 157.403104] ? _raw_spin_unlock_irq+0x27/0x70
> > <6>[ 157.403106] ? do_epoll_wait+0x3e4/0x5b0
> > <6>[ 157.403110] schedule_hrtimeout_range+0x13/0x30
> > `````````````````````````````````````````````````````````````````````````````````
> > Details log can be found in [3].
> >
> > After bisecting the tree, the following patch [4] seems to be the first
> > "bad" commit
> >
> > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> > commit 25500ba7e77ce9d3d9b5a1929d41a2ee2e23f6fe
> > Author: Matthew Wilcox (Oracle) willy@xxxxxxxxxxxxx
> > Date: Thu Mar 5 19:55:43 2026 +0000
> >
> > locking/mutex: Remove the list_head from struct mutex
> > `````````````````````````````````````````````````````````````````````````````````````````````````````````
> >
> > We could not revert the patch because of merge conflict but resetting to the
> > parent of the commit seems to fix the issue.
> >
> > Could you please check why the patch causes this regression and provide a
> > fix if necessary?
>
> Does this help?
More tidy version of the same...
---
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index b1834ab7e782..bb8b410779d4 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -42,7 +42,7 @@ __ww_waiter_last(struct mutex *lock)
struct mutex_waiter *w = lock->first_waiter;
if (w)
- w = list_prev_entry(w, list);
+ w = __ww_waiter_prev(lock, w);
return w;
}