[RFC PATCH] getrusage: Use trylock when getting sighand lock.

From: Dylan Hatch
Date: Wed Jan 17 2024 - 14:26:30 EST


Processes with many threads run the risk of causing a hard lockup if
too many threads are calling getrusage() at once. This is because a
calling thread with RUSAGE_SELF spins on the sighand lock with irq
disabled, and the critical section of getrusage scales linearly with the
size of the process. All cpus may end up spinning on the sighand lock
for a long time because another thread has the lock and is busy
iterating over 250k+ threads.

In order to mitigate this, periodically re-enable interrupts while
waiting for the sighand lock. This approach lacks the FIFO fairness of a
normal spinlock mechanism, but this effect is somewhat contained to
different threads within the same process.

-- Alternatives Considered --

In an earlier version of the above approach, we added a cond_resched()
call when disabling interrupts to prevent soft lockups. This solution
turned out not to be workable on its own since getrusage() is called
from a non-preemptible context in kernel/exit.c, but could possibly be
adapted by having an alternate version of getrusage() that can be called
from a preemptible context.

Another alternative would be to have getruage() itself release the lock
and enable interrupts periodically while iterating over large numbers of
threads. However, this would be difficult to implement correctly, and
the correctness/consistency of the data reported by getrusage() would be
questionable.

One final alternative might be to add a per-process mutex for callers of
getrusage() to acquire before acquiring the sighand lock, or to be used
as a total replacement of the sigahnd lock. We haven't fully explored
what the implications of this might be.

Signed-off-by: Dylan Hatch <dylanbhatch@xxxxxxxxxx>
---
include/linux/sched/signal.h | 13 +++++++++++
kernel/signal.c | 43 ++++++++++++++++++++++++++++++++++++
kernel/sys.c | 8 ++++++-
3 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3499c1a8b9295..7852f16139965 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -747,6 +747,19 @@ static inline struct sighand_struct *lock_task_sighand(struct task_struct *task,
return ret;
}

+extern struct sighand_struct *__lock_task_sighand_safe(struct task_struct *task,
+ unsigned long *flags);
+
+static inline struct sighand_struct *lock_task_sighand_safe(struct task_struct *task,
+ unsigned long *flags)
+{
+ struct sighand_struct *ret;
+
+ ret = __lock_task_sighand_safe(task, flags);
+ (void)__cond_lock(&task->sighand->siglock, ret);
+ return ret;
+}
+
static inline void unlock_task_sighand(struct task_struct *task,
unsigned long *flags)
{
diff --git a/kernel/signal.c b/kernel/signal.c
index 47a7602dfe8df..6d60c73b7ab91 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1397,6 +1397,49 @@ int zap_other_threads(struct task_struct *p)
return count;
}

+struct sighand_struct *__lock_task_sighand_safe(struct task_struct *tsk,
+ unsigned long *flags)
+{
+ struct sighand_struct *sighand;
+ int n;
+ bool lock = false;
+
+again:
+ rcu_read_lock();
+ local_irq_save(*flags);
+ for (n = 0; n < 500; n++) {
+ sighand = rcu_dereference(tsk->sighand);
+ if (unlikely(sighand == NULL))
+ break;
+
+ /*
+ * The downside of this approach is we loose the fairness of
+ * FIFO waiting because the acqusition order between multiple
+ * waiting tasks is effectively random.
+ */
+ lock = spin_trylock(&sighand->siglock);
+ if (!lock) {
+ cpu_relax();
+ continue;
+ }
+
+ /* __lock_task_sighand has context explaining this check. */
+ if (likely(sighand == rcu_access_pointer(tsk->sighand)))
+ break;
+ spin_unlock(&sighand->siglock);
+ lock = false;
+ }
+ rcu_read_unlock();
+
+ /* Handle pending IRQ */
+ if (!lock && sighand) {
+ local_irq_restore(*flags);
+ goto again;
+ }
+
+ return sighand;
+}
+
struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
unsigned long *flags)
{
diff --git a/kernel/sys.c b/kernel/sys.c
index e219fcfa112d8..1b1254a3c506b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1798,7 +1798,13 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
goto out;
}

- if (!lock_task_sighand(p, &flags))
+ /*
+ * We use lock_task_sighand_safe here instead of lock_task_sighand
+ * because one process with many threads all calling getrusage may
+ * otherwise cause an NMI watchdog timeout by disabling IRQs for too
+ * long.
+ */
+ if (!lock_task_sighand_safe(p, &flags))
return;

switch (who) {
--
2.43.0.381.gb435a96ce8-goog