[PATCH 3/3] softirq: don't yield if only expedited handlers are pending

From: Jakub Kicinski
Date: Thu Dec 22 2022 - 17:13:07 EST


In networking we try to keep Tx packet queues small, so we limit
how many bytes a socket may packetize and queue up. Tx completions
(from NAPI) notify the sockets when packets have left the system
(NIC Tx completion) and the socket schedules a tasklet to queue
the next batch of frames.

This leads to a situation where we go thru the softirq loop twice.
First round we have pending = NET (from the NIC IRQ/NAPI), and
the second iteration has pending = TASKLET (the socket tasklet).

On two web workloads I looked at this condition accounts for 10%
and 23% of all ksoftirqd wake ups respectively. We run NAPI
which wakes some process up, we hit need_resched() and wake up
ksoftirqd just to run the TSQ (TCP small queues) tasklet.

Tweak the need_resched() condition to be ignored if all pending
softIRQs are "non-deferred". The tasklet would run relatively
soon, anyway, but once ksoftirqd is woken we're risking stalls.

I did not see any negative impact on the latency in an RR test
on a loaded machine with this change applied.

Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx>
---
kernel/softirq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index ad200d386ec1..4ac59ffb0d55 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -601,7 +601,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)

if (time_is_before_eq_jiffies(end) || !--max_restart)
limit = SOFTIRQ_OVERLOAD_TIME;
- else if (need_resched())
+ else if (need_resched() && pending & ~SOFTIRQ_NOW_MASK)
limit = SOFTIRQ_DEFER_TIME;
else
goto restart;
--
2.38.1