Re: weird loadavg on idle machine post 5.7

From: Peter Zijlstra
Date: Tue Jul 07 2020 - 04:17:29 EST


On Tue, Jul 07, 2020 at 12:56:04AM +0100, Valentin Schneider wrote:

> > @@ -2605,8 +2596,20 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> > *
> > * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
> > * __schedule(). See the comment for smp_mb__after_spinlock().
> > + *
> > + * Form a control-dep-acquire with p->on_rq == 0 above, to ensure
> > + * schedule()'s deactivate_task() has 'happened' and p will no longer
> > + * care about it's own p->state. See the comment in __schedule().
> > */
> > - smp_rmb();
> > + smp_acquire__after_ctrl_dep();
>
> Apologies for asking again, but I'm foolishly hopeful I'll someday be able
> to grok those things without half a dozen tabs open with documentation and
> Paul McKenney papers.
>
> Do I get it right that the 'acquire' part hints this is equivalent to
> issuing a load-acquire on whatever was needed to figure out whether or not
> the take the branch (in this case, p->on_rq, amongst other things); IOW
> ensures any memory access appearing later in program order has to happen
> after the load?
>
> That at least explains to me the load->{load,store} wording in
> smp_acquire__after_ctrl_dep().

Yes.

So the thing is that hardware MUST NOT speculate stores, or rather, if
it does, it must take extreme measures to ensure they do not become
visible in any way shape or form, since speculative stores lead to
instant OOTA problems.

Therefore we can say that branches order stores and if the branch
condition depends on a load, we get a load->store order. IOW the load
must complete before we can resolve the branch, which in turn enables
the store to become visible/happen.

If we then add an smp_rmb() to the branch to order load->load, we end up
with a load->{load,store} ordering, which is equivalent to a
load-acquire.

The reason to do it like that, is that load-aquire would otherwise
require an smp_mb(), since for many platforms that's the only barrier
that has load->store ordering.

The down-side of doing it like this, as Paul will be quick to point out,
is that the C standard doesn't recognise control dependencies and thus
the compiler would be in its right to 'optimize' our conditional away.

We're relying on the compilers not having done this in the past and
there being sufficient compiler people interested in compiling Linux to
avoid this from happening.


Anyway, this patch is basically:

LOAD p->state LOAD-ACQUIRE p->on_rq == 0
MB
STORE p->on_rq, 0 STORE p->state, TASK_WAKING

which ensures the TASK_WAKING store happens after the p->state load.
Just a wee bit complicated due to not actually adding any barriers while
adding additional ordering.

Anyway, let me now endeavour to write a coherent Changelog for this mess
:-(