Re: [PATCH v2 2/2] sched/fair: Trigger nohz.next_balance updates when a CPU goes NOHZ-idle

From: Dietmar Eggemann
Date: Mon Jul 19 2021 - 12:06:07 EST


On 19/07/2021 12:31, Valentin Schneider wrote:

[...]

> @@ -10351,6 +10352,9 @@ static void nohz_balancer_kick(struct rq *rq)
> unlock:
> rcu_read_unlock();
> out:
> + if (READ_ONCE(nohz.needs_update))
> + flags |= NOHZ_NEXT_KICK;
> +

Since NOHZ_NEXT_KICK is part of NOHZ_KICK_MASK, some conditions above
will already set it in flags. Is this intended?

> if (flags)
> kick_ilb(flags);
> }
> @@ -10447,12 +10451,13 @@ void nohz_balance_enter_idle(int cpu)
> /*
> * Ensures that if nohz_idle_balance() fails to observe our
> * @idle_cpus_mask store, it must observe the @has_blocked
> - * store.
> + * and @needs_update stores.
> */
> smp_mb__after_atomic();
>
> set_cpu_sd_state_idle(cpu);
>
> + WRITE_ONCE(nohz.needs_update, 1);
> out:
> /*
> * Each time a cpu enter idle, we assume that it has blocked load and
> @@ -10501,13 +10506,17 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,

function header would need update to incorporate the new 'update
nohz.next_balance' functionality. It only talks about 'update of blocked
load' and 'complete load balance' so far.

> /*
> * We assume there will be no idle load after this update and clear
> * the has_blocked flag. If a cpu enters idle in the mean time, it will
> - * set the has_blocked flag and trig another update of idle load.
> + * set the has_blocked flag and trigger another update of idle load.
> * Because a cpu that becomes idle, is added to idle_cpus_mask before
> * setting the flag, we are sure to not clear the state and not
> * check the load of an idle cpu.
> + *
> + * Same applies to idle_cpus_mask vs needs_update.
> */
> if (flags & NOHZ_STATS_KICK)
> WRITE_ONCE(nohz.has_blocked, 0);
> + if (flags & NOHZ_NEXT_KICK)
> + WRITE_ONCE(nohz.needs_update, 0);
>
> /*
> * Ensures that if we miss the CPU, we must see the has_blocked
> @@ -10531,6 +10540,8 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
> if (need_resched()) {
> if (flags & NOHZ_STATS_KICK)
> has_blocked_load = true;

This looks weird now? 'has_blocked_load = true' vs
'WRITE_ONCE(nohz.needs_update, 1)'.

> + if (flags & NOHZ_NEXT_KICK)
> + WRITE_ONCE(nohz.needs_update, 1);
> goto abort;
> }
>
>