[PATCH] softirq: let the userside tune the SOFTIRQ_NOW_MASK with sysctl

From: Jason Xing
Date: Sun Apr 09 2023 - 22:31:37 EST


From: Jason Xing <kernelxing@xxxxxxxxxxx>

Currently we have two exceptions which could avoid ksoftirqd when
invoking softirqs: HI_SOFTIRQ and TASKLET_SOFTIRQ. They were introduced
in the commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous")
which says if we don't mask them, it will cause excessive latencies in
some cases.

It also mentioned that we may take time softirq into consideration:
"We should probably also consider the timer softirqs to be synchronous
and not be delayed to ksoftirqd."

The same reason goes here. In production workload, we found that some
sensitive applications are complaining about the high latency of
tx/rx path in networking, because some packets have to be delayed in
ksoftirqd kthread that can be blocked in the runqueue for some while
(say, 10-70 ms) especially in guestOS. So marking tx/rx softirq
synchronous, for instance, NET_RX_SOFTIRQ, solves such issue.

We tested and observed the high latency above 50ms of the rx path in
the real workload:
without masking: over 100 times hitting the limit per hour
with masking: less than 10 times for a whole day

As we all know the default config is not able to satisify everyone's
requirements. After applied this patch exporting the softirq mask to
the userside, we can serve different cases by tuning with sysctl.

Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx>
---
kernel/softirq.c | 29 +++++++++++++++++++++++++++--
1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index c8a6913c067d..aa6e52ca2c55 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -65,6 +65,8 @@ const char * const softirq_to_name[NR_SOFTIRQS] = {
"TASKLET", "SCHED", "HRTIMER", "RCU"
};

+unsigned int sysctl_softirq_mask = 1 << HI_SOFTIRQ | 1 << TASKLET_SOFTIRQ;
+
/*
* we cannot loop indefinitely here to avoid userspace starvation,
* but we also don't want to introduce a worst case 1/HZ latency
@@ -80,17 +82,23 @@ static void wakeup_softirqd(void)
wake_up_process(tsk);
}

+static bool softirq_now_mask(unsigned long pending)
+{
+ if (pending & sysctl_softirq_mask)
+ return false;
+ return true;
+}
+
/*
* If ksoftirqd is scheduled, we do not want to process pending softirqs
* right now. Let ksoftirqd handle this at its own rate, to get fairness,
* unless we're doing some of the synchronous softirqs.
*/
-#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
static bool ksoftirqd_running(unsigned long pending)
{
struct task_struct *tsk = __this_cpu_read(ksoftirqd);

- if (pending & SOFTIRQ_NOW_MASK)
+ if (softirq_now_mask(pending))
return false;
return tsk && task_is_running(tsk) && !__kthread_should_park(tsk);
}
@@ -903,6 +911,22 @@ void tasklet_unlock_wait(struct tasklet_struct *t)
EXPORT_SYMBOL_GPL(tasklet_unlock_wait);
#endif

+static struct ctl_table softirq_sysctls[] = {
+ {
+ .procname = "softirq_mask",
+ .data = &sysctl_softirq_mask,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {}
+};
+
+static void __init softirq_mask_sysctl_init(void)
+{
+ register_sysctl_init("kernel", softirq_sysctls);
+}
+
void __init softirq_init(void)
{
int cpu;
@@ -916,6 +940,7 @@ void __init softirq_init(void)

open_softirq(TASKLET_SOFTIRQ, tasklet_action);
open_softirq(HI_SOFTIRQ, tasklet_hi_action);
+ softirq_mask_sysctl_init();
}

static int ksoftirqd_should_run(unsigned int cpu)
--
2.37.3