Re: [PATCH] sched: fix clear NOHZ_BALANCE_KICK

From: Frederic Weisbecker
Date: Tue Jun 04 2013 - 05:56:46 EST


On Tue, Jun 04, 2013 at 10:21:06AM +0200, Vincent Guittot wrote:
> On 4 June 2013 00:48, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> > On Thu, May 30, 2013 at 05:23:05PM +0200, Vincent Guittot wrote:
> >> I have faced a sequence where the Idle Load Balance was sometime not
> >> triggered for a while on my platform.
> >>
> >> CPU 0 and CPU 1 are running tasks and CPU 2 is idle
> >>
> >> CPU 1 kicks the Idle Load Balance
> >> CPU 1 selects CPU 2 as the new Idle Load Balancer
> >> CPU 1 sets NOHZ_BALANCE_KICK for CPU 2
> >> CPU 1 sends a reschedule IPI to CPU 2
> >> While CPU 2 wakes up, CPU 0 or CPU 1 migrates a waking task A on CPU 2
> >> CPU 2 finally wakes up, runs task A and discards the Idle Load Balance
> >> Task A quickly goes back to sleep (before a tick occurs on CPU 2)
> >> CPU 2 goes back to idle with NOHZ_BALANCE_KICK set
> >>
> >> Whenever CPU 2 will be selected for the ILB, reschedule IPI will be not
> >> sent to CPU2, which is idle, because NOHZ_BALANCE_KICK is already set
> >> and no Idle Load Balance will be performed.
> >>
> >> We must wait for the sched softirq to be raised on CPU 2 thanks to
> >> another part of the kernel to clear NOHZ_BALANCE_KICKand come back to
> >> a normal situation.
> >>
> >> The proposed solution clears NOHZ_BALANCE_KICK in schedule_ipi if
> >> we can't raise the sched_softirq for the Idle Load Balance.
> >>
> >> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> >> ---
> >> kernel/sched/core.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index 58453b8..51fc715 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -1420,7 +1420,8 @@ void scheduler_ipi(void)
> >> if (unlikely(got_nohz_idle_kick() && !need_resched())) {
> >> this_rq()->idle_balance = 1;
> >> raise_softirq_irqoff(SCHED_SOFTIRQ);
> >> - }
> >> + } else
> >> + clear_bit(NOHZ_BALANCE_KICK, nohz_flags(smp_processor_id()));
> >
> > But then do we reach this if the IPI happens while running the non-idle task in
> > CPU 2? The first got_nohz_idle_kick() test would drop us out early from scheduler_ipi()
> > due to the idle_cpu() test. So the flag doesn't get cleared in this case.
>
> The 1st point is that only idle cpu can be selected for idle load
> balance. But this doesn't prevent the cpu to wake up while it is
> kicked for idle load balance.

Yep.

> I had added the clear_bit for the 1st got_nohz_idle_kick in the draft
> version of this patch but the test of the emptiness of the wake_list,
> the call to smp_send_reschedule in the various way to wake up the idle
> cpu and the results of the tests have convinced me (may be wrongly)
> that it was not necessary.

Hmm, if the CPU is idle, get selected as an ilb, but then the CPU schedules
a non-idle task and receive the IPI in this non-idle context then finally
it goes back to idle for a long time. It can stay idle without ever been
notified with this NOHZ_BALANCE_KICK flag set.

But I can be missing something that clears the flag somewhere in that scenario.
In any case it's not obvious.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/