Re: [PATCH] sched: fix clear NOHZ_BALANCE_KICK

From: Peter Zijlstra
Date: Tue Jun 04 2013 - 07:15:25 EST


On Tue, Jun 04, 2013 at 12:26:22PM +0200, Frederic Weisbecker wrote:
> On Tue, Jun 04, 2013 at 11:36:11AM +0200, Peter Zijlstra wrote:
> >
> > The best I can seem to come up with is something like the below; but I think
> > its ghastly. Surely we can do something saner with that bit.
> >
> > Having to clear it at 3 different places is just wrong.
>
> We could clear the flag early in scheduler_ipi() and set some
> specific value in rq->idle_balance that tells we want nohz idle
> balancing from the softirq, something like this untested:

Yeah, I suppose something like that is a little better.. a few nits
though:

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 58453b8..330136b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -630,15 +630,14 @@ void wake_up_nohz_cpu(int cpu)
> wake_up_idle_cpu(cpu);
> }
>
> -static inline bool got_nohz_idle_kick(void)
> +static inline bool got_nohz_idle_kick(int cpu)
> {
> - int cpu = smp_processor_id();
> - return idle_cpu(cpu) && test_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu));
> + return test_and_clear_bit(NOHZ_BALANCE_KICK, nohz_flags(cpu));
> }
>
> #else /* CONFIG_NO_HZ_COMMON */
>
> -static inline bool got_nohz_idle_kick(void)
> +static inline bool got_nohz_idle_kick(int cpu)
> {
> return false;
> }
> @@ -1393,8 +1392,12 @@ static void sched_ttwu_pending(void)
>
> void scheduler_ipi(void)
> {
> - if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick()
> - && !tick_nohz_full_cpu(smp_processor_id()))
> + int cpu = smp_processor_id();
> + bool idle_kick = got_nohz_idle_kick(cpu);

This puts an unconditional atomic instruction in the IPI path.
if (test) clear();
is lots cheaper, esp. since most IPIs won't have this flag set.

> +
> + if (!(idle_kick && idle_cpu(cpu))
> + && llist_empty(&this_rq()->wake_list)
> + && !tick_nohz_full_cpu(cpu)

What's with this weird operator first split style?

> return;
>
> /*

> +enum idle_balance_type {
> + IDLE_BALANCE = 1,
> + IDLE_NOHZ_BALANCE = 2,
> +};

You might want to update the rq->idle_balance assignment in
scheduler_tick() to make sure it uses the right value (it does now, but
there's nothing stopping people from changing the values).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/