Re: [RFC PATCH] getrusage: Use trylock when getting sighand lock.

From: Oleg Nesterov
Date: Wed Jan 17 2024 - 15:46:54 EST


Heh ;)

getrusage() should not use ->siglock at all.

On my TODO list. I'll try to make a patch this week.

Oleg.

On 01/17, Dylan Hatch wrote:
>
> Processes with many threads run the risk of causing a hard lockup if
> too many threads are calling getrusage() at once. This is because a
> calling thread with RUSAGE_SELF spins on the sighand lock with irq
> disabled, and the critical section of getrusage scales linearly with the
> size of the process. All cpus may end up spinning on the sighand lock
> for a long time because another thread has the lock and is busy
> iterating over 250k+ threads.
>
> In order to mitigate this, periodically re-enable interrupts while
> waiting for the sighand lock. This approach lacks the FIFO fairness of a
> normal spinlock mechanism, but this effect is somewhat contained to
> different threads within the same process.
>
> -- Alternatives Considered --
>
> In an earlier version of the above approach, we added a cond_resched()
> call when disabling interrupts to prevent soft lockups. This solution
> turned out not to be workable on its own since getrusage() is called
> from a non-preemptible context in kernel/exit.c, but could possibly be
> adapted by having an alternate version of getrusage() that can be called
> from a preemptible context.
>
> Another alternative would be to have getruage() itself release the lock
> and enable interrupts periodically while iterating over large numbers of
> threads. However, this would be difficult to implement correctly, and
> the correctness/consistency of the data reported by getrusage() would be
> questionable.
>
> One final alternative might be to add a per-process mutex for callers of
> getrusage() to acquire before acquiring the sighand lock, or to be used
> as a total replacement of the sigahnd lock. We haven't fully explored
> what the implications of this might be.
>
> Signed-off-by: Dylan Hatch <dylanbhatch@xxxxxxxxxx>
> ---
> include/linux/sched/signal.h | 13 +++++++++++
> kernel/signal.c | 43 ++++++++++++++++++++++++++++++++++++
> kernel/sys.c | 8 ++++++-
> 3 files changed, 63 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 3499c1a8b9295..7852f16139965 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -747,6 +747,19 @@ static inline struct sighand_struct *lock_task_sighand(struct task_struct *task,
> return ret;
> }
>
> +extern struct sighand_struct *__lock_task_sighand_safe(struct task_struct *task,
> + unsigned long *flags);
> +
> +static inline struct sighand_struct *lock_task_sighand_safe(struct task_struct *task,
> + unsigned long *flags)
> +{
> + struct sighand_struct *ret;
> +
> + ret = __lock_task_sighand_safe(task, flags);
> + (void)__cond_lock(&task->sighand->siglock, ret);
> + return ret;
> +}
> +
> static inline void unlock_task_sighand(struct task_struct *task,
> unsigned long *flags)
> {
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 47a7602dfe8df..6d60c73b7ab91 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1397,6 +1397,49 @@ int zap_other_threads(struct task_struct *p)
> return count;
> }
>
> +struct sighand_struct *__lock_task_sighand_safe(struct task_struct *tsk,
> + unsigned long *flags)
> +{
> + struct sighand_struct *sighand;
> + int n;
> + bool lock = false;
> +
> +again:
> + rcu_read_lock();
> + local_irq_save(*flags);
> + for (n = 0; n < 500; n++) {
> + sighand = rcu_dereference(tsk->sighand);
> + if (unlikely(sighand == NULL))
> + break;
> +
> + /*
> + * The downside of this approach is we loose the fairness of
> + * FIFO waiting because the acqusition order between multiple
> + * waiting tasks is effectively random.
> + */
> + lock = spin_trylock(&sighand->siglock);
> + if (!lock) {
> + cpu_relax();
> + continue;
> + }
> +
> + /* __lock_task_sighand has context explaining this check. */
> + if (likely(sighand == rcu_access_pointer(tsk->sighand)))
> + break;
> + spin_unlock(&sighand->siglock);
> + lock = false;
> + }
> + rcu_read_unlock();
> +
> + /* Handle pending IRQ */
> + if (!lock && sighand) {
> + local_irq_restore(*flags);
> + goto again;
> + }
> +
> + return sighand;
> +}
> +
> struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
> unsigned long *flags)
> {
> diff --git a/kernel/sys.c b/kernel/sys.c
> index e219fcfa112d8..1b1254a3c506b 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1798,7 +1798,13 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
> goto out;
> }
>
> - if (!lock_task_sighand(p, &flags))
> + /*
> + * We use lock_task_sighand_safe here instead of lock_task_sighand
> + * because one process with many threads all calling getrusage may
> + * otherwise cause an NMI watchdog timeout by disabling IRQs for too
> + * long.
> + */
> + if (!lock_task_sighand_safe(p, &flags))
> return;
>
> switch (who) {
> --
> 2.43.0.381.gb435a96ce8-goog
>