Re: [PATCH 2/3] sched: Unloop sched avg decaying

From: Frederic Weisbecker
Date: Thu Jun 30 2016 - 08:52:34 EST


On Tue, Jun 14, 2016 at 05:58:42PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2016 at 05:28:01PM +0200, Frederic Weisbecker wrote:
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 385c947..0c0578a 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -666,17 +666,17 @@ bool sched_can_stop_tick(struct rq *rq)
> > void sched_avg_update(struct rq *rq)
> > {
> > s64 period = sched_avg_period();
> > + s64 delta;
> > + u64 rem;
> > + int pending;
> >
> > + delta = (s64)(rq_clock(rq) - rq->age_stamp);
> > + if (delta <= period)
> > + return;
> > +
> > + pending = div64_u64_rem(delta, period, &rem);
> > + rq->age_stamp += delta - rem;
> > + rq->rt_avg >>= pending;
> > }
>
> Blergh, and now do the profile on machine that doesn't have major
> transistor count dedicated to divisions.

Indeed I was afraid of something like that. That said I'm not sure which
is worse between iteration or division out of native asm.

Would it be worth testing?

>
> Why not add the division to the nohz exit path only?

It would be worse I think because we may exit much more often from nohz
than we reach a sched_avg_period().

So the only safe optimization I can do for now is:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..4632d31 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -626,8 +626,9 @@ bool sched_can_stop_tick(struct rq *rq)
void sched_avg_update(struct rq *rq)
{
s64 period = sched_avg_period();
+ int i;

- while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
+ for (i = 0; (s64)(rq_clock(rq) - rq->age_stamp) > period); i++) {
/*
* Inline assembly required to prevent the compiler
* optimising this loop into a divmod call.
@@ -635,8 +636,8 @@ void sched_avg_update(struct rq *rq)
*/
asm("" : "+rm" (rq->age_stamp));
rq->age_stamp += period;
- rq->rt_avg /= 2;
}
+ rq->rt_avg >>= i;
}

#endif /* CONFIG_SMP */