Re: [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq

From: srinivas pandruvada
Date: Mon Dec 19 2022 - 17:55:04 EST


On Mon, 2022-12-19 at 12:33 +0100, Peter Zijlstra wrote:
> On Fri, Dec 16, 2022 at 11:07:48PM +0100, Frederic Weisbecker wrote:
> > On Fri, Dec 16, 2022 at 12:19:34PM +0100, Peter Zijlstra wrote:
> > > On Thu, Dec 15, 2022 at 10:42:59AM -0800, Srinivas Pandruvada
> > > wrote:
> > > > +               /* Give ksoftirqd 1 jiffy to get a chance to
> > > > start its job */
> > > > +               if (!READ_ONCE(it.done) &&
> > > > task_is_running(__this_cpu_read(ksoftirqd))) {
> > > > +                       __set_current_state(TASK_UNINTERRUPTIBL
> > > > E);
> > > > +                       schedule_timeout(1);
> > > > +               }
> > >
> > > That's absolutely disgusting :-/
> >
> > I know, and I hate checking
> > task_is_running(__this_cpu_read(ksoftirqd))
> > everywhere in idle. And in fact it doesn't work because some
> > cpuidle drivers
> > also do need_resched() checks.
>
> quite a few indeed.
>
> > I guess that either we assume that the idle injection is more
> > important
> > than serving softirqs and we shutdown the warnings accordingly, or
> > we arrange
> > for idle injection to have a lower prio than ksoftirqd.
>
> ksoftirq is typically a CFS task while idle injection is required to
> be
> a FIFO (typically FIFO-1) task -- so that would require lifting
> ksoftirqd to FIFO and that has other problems.
>
> I'm guessing the problem case is where idle injection comes in while
> ksoftirq is running (and preempted), because in that case you cannot
> run
> the softirq stuff in-place.
>
> The 'right' thing to do would be to PI boost ksoftirqd in that case I
> suppose. Perhaps something like so, it would boost ksoftirq when it's
> running, and otherwise runs the ksoftirqd thing in-situ.
>
> I've not really throught hard about this though, so perhaps I
> completely
> wrecked things.
Had to fix some compile issues in the patch but able to test. This
fixes the case when softirq is pending before play_idle(). For example:

<idle>-0[004] 254.440116: softirq_raise: vec=9 [action=RCU]
<idle>-0[004] 254.440116: softirq_exit: vec=9 [action=RCU]
<idle>-0[004] 254.440116: sched_waking: comm=ksoftirqd/4 pid=41
prio=120 target_cpu=004
<idle>-0[004] 254.440117: sched_wakeup: ksoftirqd/4:41 [120]<CANT
FIND FIELD success> CPU:004
<idle>-0[004] 254.440119: sched_switch: swapper/4:0 [120] R ==>
kidle_inj/4:1180 [49]
kidle_inj/4-1180[004] 254.440120: sched_kthread_work_execute_start:
work struct 0xffffd1dcbfd25230: function clamp_idle_injection_func
kidle_inj/4-1180[004] 254.440121: play_idle_enter: state=24000000
cpu_id=4
kidle_inj/4-1180[004] 254.440122: bputs: __run_ksoftirqd: running
from play_idle

"calliing __do_softirq() here"

kidle_inj/4-1180[004] 254.440122: softirq_entry: vec=9 [action=RCU]
kidle_inj/4-1180[004] 254.440125: softirq_raise: vec=9 [action=RCU]
kidle_inj/4-1180[004] 254.440126: softirq_exit: vec=9 [action=RCU]

But after ksoftirqd_run_end(), which will renable local irq, this may
further cause more soft irq pending. Here RCU soft irq entry continues


kidle_inj/4-1180 [004]254.440141: softirq_exit: vec=9 [action=RCU]
kidle_inj/4-1180 [004]254.440141: softirq_entry: vec=9 [action=RCU]
kidle_inj/4-1180 [004]254.440143: softirq_raise: vec=9 [action=RCU]
kidle_inj/4-1180 [004]254.440143: softirq_exit: vec=9 [action=RCU]
kidle_inj/4-1180 [004]254.440144: bputs: can_stop_idle_tick.isra.0:
soft irq pending

Another warning above.


This log is with ping test.

Adding additional __run_ksoftirqd(), inside do_idle() while
(!need_resched()) doesn't help.

If we add check to can_stop_idle_tick() before reporting helps. But
since __do_softirq() can't guarantee that there are no pending soft
irqs, it still have chance to get another warning.

Thanks,
Srinivas

>
>
> ---
>  include/linux/interrupt.h   |  3 +++
>  kernel/sched/build_policy.c |  1 +
>  kernel/sched/idle.c         |  4 ++++
>  kernel/softirq.c            | 20 +++++++++++++++++---
>  4 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index a92bce40b04b..a2db811d6947 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -606,6 +606,9 @@ extern void raise_softirq_irqoff(unsigned int
> nr);
>  extern void raise_softirq(unsigned int nr);
>  
>  DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
> +DECLARE_PER_CPU(struct rt_mutex, ksoftirq_lock);
> +
> +extern bool __run_ksoftirqd(unsigned int cpu);
>  
>  static inline struct task_struct *this_cpu_ksoftirqd(void)
>  {
> diff --git a/kernel/sched/build_policy.c
> b/kernel/sched/build_policy.c
> index d9dc9ab3773f..731c595e551c 100644
> --- a/kernel/sched/build_policy.c
> +++ b/kernel/sched/build_policy.c
> @@ -28,6 +28,7 @@
>  #include <linux/suspend.h>
>  #include <linux/tsacct_kern.h>
>  #include <linux/vtime.h>
> +#include <linux/interrupt.h>
>  
>  #include <uapi/linux/sched/types.h>
>  
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index f26ab2675f7d..093bfdfca2f1 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -370,6 +370,10 @@ void play_idle_precise(u64 duration_ns, u64
> latency_ns)
>         WARN_ON_ONCE(!duration_ns);
>         WARN_ON_ONCE(current->mm);
>  
> +       rt_mutex_lock(&per_cpu(ksoftirq_lock, cpu));
> +       __run_ksoftirqd(smp_processor_id());
> +       rt_mutex_unlock(&per_cpu(ksoftirq_lock, cpu));
> +
>         rcu_sleep_check();
>         preempt_disable();
>         current->flags |= PF_IDLE;
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index c8a6913c067d..a2cb600383a4 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -59,6 +59,7 @@ EXPORT_PER_CPU_SYMBOL(irq_stat);
>  static struct softirq_action softirq_vec[NR_SOFTIRQS]
> __cacheline_aligned_in_smp;
>  
>  DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
> +DEFINE_PER_CPU(struct rt_mutex, ksoftirq_lock);
>  
>  const char * const softirq_to_name[NR_SOFTIRQS] = {
>         "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL",
> @@ -912,6 +913,7 @@ void __init softirq_init(void)
>                         &per_cpu(tasklet_vec, cpu).head;
>                 per_cpu(tasklet_hi_vec, cpu).tail =
>                         &per_cpu(tasklet_hi_vec, cpu).head;
> +               rt_mutex_init(&per_cpu(ksoftirq_mutex, cpu));
>         }
>  
>         open_softirq(TASKLET_SOFTIRQ, tasklet_action);
> @@ -923,7 +925,7 @@ static int ksoftirqd_should_run(unsigned int cpu)
>         return local_softirq_pending();
>  }
>  
> -static void run_ksoftirqd(unsigned int cpu)
> +bool __run_ksoftirqd(unsigned int cpu)
>  {
>         ksoftirqd_run_begin();
>         if (local_softirq_pending()) {
> @@ -933,10 +935,22 @@ static void run_ksoftirqd(unsigned int cpu)
>                  */
>                 __do_softirq();
>                 ksoftirqd_run_end();
> -               cond_resched();
> -               return;
> +               return true;
>         }
>         ksoftirqd_run_end();
> +       return false;
> +}
> +
> +static void run_ksoftirqd(unsigned int cpu)
> +{
> +       bool run;
> +
> +       rt_mutex_lock(&per_cpu(ksoftirq_lock, cpu));
> +       ran = __run_ksoftirqd(cpu);
> +       rt_mutex_unlock(&per_cpu(ksoftirq_lock, cpu));
> +
> +       if (ran)
> +               cond_resched();
>  }
>  
>  #ifdef CONFIG_HOTPLUG_CPU