Re: [PATCH v4 02/13] locking/ww_mutex: Remove wakeups from under mutex::wait_lock

From: John Stultz
Date: Fri Jun 23 2023 - 20:42:22 EST


On Fri, Jun 23, 2023 at 5:21 AM Dietmar Eggemann
<dietmar.eggemann@xxxxxxx> wrote:
>
> Hi John,
>
> On 01/06/2023 07:58, John Stultz wrote:
> > From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> >
> > In preparation to nest mutex::wait_lock under rq::lock we need to remove
> > wakeups from under it.
>
> [...]
>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > Signed-off-by: Connor O'Brien <connoro@xxxxxxxxxx>
> > Signed-off-by: John Stultz <jstultz@xxxxxxxxxx>
> > ---
> > v2:
> > * Move wake_q_init() as suggested by Waiman Long
> > ---
> > include/linux/ww_mutex.h | 3 +++
> > kernel/locking/mutex.c | 8 ++++++++
> > kernel/locking/ww_mutex.h | 10 ++++++++--
> > 3 files changed, 19 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
> > index bb763085479a..9335b2202017 100644
> > --- a/include/linux/ww_mutex.h
> > +++ b/include/linux/ww_mutex.h
> > @@ -19,6 +19,7 @@
> >
> > #include <linux/mutex.h>
> > #include <linux/rtmutex.h>
> > +#include <linux/sched/wake_q.h>
> >
> > #if defined(CONFIG_DEBUG_MUTEXES) || \
> > (defined(CONFIG_PREEMPT_RT) && defined(CONFIG_DEBUG_RT_MUTEXES))
> > @@ -58,6 +59,7 @@ struct ww_acquire_ctx {
> > unsigned int acquired;
> > unsigned short wounded;
> > unsigned short is_wait_die;
> > + struct wake_q_head wake_q;
>
> you told me that there is already an issue in this patch even w/o PE
> when running `insmod /lib/modules/test-ww_mutex.ko`.
>
> The issue is related to Connor's version (1):
>
> https://lkml.kernel.org/r/20221003214501.2050087-2-connoro@xxxxxxxxxx
>
> struct ww_acquire_ctx {
>
> struct wake_q_head wake_q;
>
>
> __mutex_lock_common()
>
> if (ww_ctx)
> ww_ctx_wake(ww_ctx)
>
> wake_up_q(&ww_ctx->wake_q);
> wake_q_init(&ww_ctx->wake_q);
>
>
> Juri's version (2):
>
> https://lkml.kernel.org/r/20181009092434.26221-3-juri.lelli@xxxxxxxxxx
>
> __mutex_lock_common()
>
> DEFINE_WAKE_Q(wake_q) <-- !!!
>
> __ww_mutex_check_waiters(..., wake_q)
>
> __ww_mutex_die(..., wake_q)
>
> wake_q_add(wake_q, waiter->task)
>
> wake_up_q(&wake_q)
>
>
> `insmod /lib/modules/test-ww_mutex.ko` runs fine with (2) but not with
> (1) (both w/o the remaining PE patches).
>
> So to test the PE issues we talked about already which come with `[PATCH
> v4 09/13] sched: Add proxy execution` and CONFIG_PROXY_EXEC=y we need to
> fix (1) or go back to (2).
>
> I haven't found any clues why (2) was changed to (1) so far.

Right. I don't have context for why, but moving the wake_q to the
ww_ctx does seem to break the wake_q assumptions, and results in lost
wakeups.

In my current tree, I've switched back to Juri's older version of the
patch, but adding one fix from Connor's, and with that the patch
doesn't run into this trouble.

That said, I still am catching and debugging problems with later
patches in the series, which has required breaking up the core proxy
change into much finer grained changes so I can debug what's going
wrong. My v5 patch set will reflect this.

thanks
-john