Re: [PATCH v2] locking/rtmutex: Limit # of lock stealing for non-RT waiters

From: Sebastian Andrzej Siewior
Date: Fri Jun 24 2022 - 02:42:32 EST


On 2022-06-23 10:41:17 [-0400], Waiman Long wrote:
>
> On 6/23/22 09:32, Sebastian Andrzej Siewior wrote:
> > Do you have more insight on how this was tested/ created? Based on that,
> > systemd and a random kworker waited on a lock for more than 10 minutes.
>
> The hang happens when our QE team run thier kernel tier 1 test which, I
> think, lasts several hours. The hang happens in some runs but not all of
> them. So it is kind of opportunistic. Mike should be able to provide a
> better idea about frequency and so on.

So we talk here about 64+ CPU or more than that?

> > I added a trace-printk each time a non-RT waiter got the lock stolen,
> > kicked a kernel build and a package upgrade and took a look at the stats
> > an hour later:
> > - sh got its lock stolen 3416 times. I didn't lock the pid so I can't
> > look back and check how long it waited since the first time.
> > - the median average of stolen locks is 173.
> Maybe we should also more lock stealing per waiter than the 10 that I used
> in the patch. I am open to suggestion to what is a good value to use.

I have no idea either. I just looked at a run to see what the number
actually are. I have no numbers in terms of performance. So what most
likely happens is that on an unlock operation the waiter gets a wake-up
but before he gets a chance to acquire the lock, it is already taken and
he goes back to sleep again. While this looks painful it might be better
performance wise because the other task was able to acquire the lock
without waiting. But then it is not fair and this happens.
One thing that I'm curious about is, what lock is it (one or two global
hot spots or many). And how to benchmark this…

> > > Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
> > > Reported-by: Mike Stowell <mstowell@xxxxxxxxxx>
> > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
>
> Thanks for your time looking at the patch.

no problem.

> Cheers,
> Longman

Sebastian