Re: [PATCH v2] locking/rtmutex: Limit # of lock stealing for non-RT waiters

From: Sebastian Andrzej Siewior
Date: Thu Jun 23 2022 - 09:33:09 EST


On 2022-06-21 15:36:41 [-0400], Waiman Long wrote:
> Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
> stealing") allows unlimited number of lock stealing's for non-RT
> tasks. That can lead to lock starvation of non-RT top waiter tasks if
> there is a constant incoming stream of non-RT lockers. This can cause
> task lockup in PREEMPT_RT kernel. For example,
>
> [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
> [ 1872.984225] INFO: task kworker/6:4:63401 blocked for more than 622 seconds.
>
> Avoiding this problem and ensuring forward progress by limiting the
> number of times that a lock can be stolen from each waiter. This patch
> sets a threshold of 10. That number is arbitrary and can be changed
> if needed.
>
> With that change, the task lockups previously observed when running
> stressful workloads on PREEMPT_RT kernel disappeared.

Do you have more insight on how this was tested/ created? Based on that,
systemd and a random kworker waited on a lock for more than 10 minutes.

I added a trace-printk each time a non-RT waiter got the lock stolen,
kicked a kernel build and a package upgrade and took a look at the stats
an hour later:
- sh got its lock stolen 3416 times. I didn't lock the pid so I can't
look back and check how long it waited since the first time.
- the median average of stolen locks is 173.

> Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
> Reported-by: Mike Stowell <mstowell@xxxxxxxxxx>
> Signed-off-by: Waiman Long <longman@xxxxxxxxxx>

Sebastian