Re: [RFC PATCH 4/4] sched: Upload nohz full CPU load on task enqueue/dequeue

From: Frederic Weisbecker
Date: Wed Jan 20 2016 - 12:21:16 EST


On Wed, Jan 20, 2016 at 05:56:57PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 20, 2016 at 03:54:19PM +0100, Frederic Weisbecker wrote:
>
> > > You can simply do:
> > >
> > > for_each_nohzfull_cpu(cpu) {
> > > struct rq *rq = rq_of(cpu);
> > >
> > > raw_spin_lock(&rq->lock);
> > > update_cpu_load_active(rq);
> > > raw_spin_unlock(&rq->lock);
> > > }
> >
> > But from where should we do that?
>
> house keeper thingy

You mean a periodic call to the above from the housekeepers?

I didn't think about doing that because you nacked that approach with
scheduler_tick(). This isn't much different.

It means the housekeeper is entirely dedicated to full dynticks CPUs.

>
> > Maybe we can do it before we call source/target_load(), on the
> > selected targets needed by the caller? The problem is that if we do
> > that right after a task got enqueued on the nohz runqueue, we may
> > accidentally account it as the whole dynticks frame (I mean, if we get
> > rid of that enqueue/dequeue accounting).
>
> Yes so? What if the current tick happens right after a task get
> enqueued? Then we account the whole tick as !idle, even tough we might
> have been idle for 99% of the time.

Idle is correctly taken care of there because tick_nohz_idle_exit() makes
sure that the whole dynticks load recorded is 0.

>
> Not a problem, this is sampling.

It's ok to have sampling imprecisions indeed but accounting long samples
of singletask time as multitask is rather erratic than imprecise. Now
that's the issue with pure on-demand updates. If we do the remote update
periodically instead, that wouldn't be a problem anymore as it would just
be about precision.

>
> Doing it locally or remotely doesn't matter.
>
> > > Also, since when can we have enqueues/dequeues while NOHZ_FULL ? I
> > > thought that was the 1 task 100% cpu case, there are no
> > > enqueues/dequeues there.
> >
> > That's the most optimized case but we can definetly have small moments
> > with more than one task running. For example if we have a workqueue,
> > or such short and quick tasks.
>
> The moment you have nr_running>1 the tick comes back on.

Sure, but the current nohz frame exit accounting is wrong at it accounts the
newly woken task as the whole tickless load. We need to record the singletask
load on nohz frame entry at least so we can retrieve and account it on nohz exit.

Unless, again, if we do that housekeeping periodic remote update. Then we don't
care locally at all anymore.