Re: [PATCH 1/4] sched: Don't account tickless CPU load on tick

From: Frederic Weisbecker
Date: Wed Jan 20 2016 - 12:36:39 EST


On Wed, Jan 20, 2016 at 09:42:16AM +0100, Thomas Gleixner wrote:
> On Tue, 19 Jan 2016, Peter Zijlstra wrote:
>
> > On Wed, Jan 13, 2016 at 05:01:28PM +0100, Frederic Weisbecker wrote:
> > > The cpu load update on tick doesn't care about dynticks and as such is
> > > buggy when occuring on nohz ticks (including idle ticks) as it resets
> > > the jiffies snapshot that was recorded on nohz entry. We eventually
> > > ignore the potentially long tickless load that happened before the
> > > tick.
> >
> > I don't get it, how can we call scheduler_tick() while
> > tick_nohz_tick_stopped() ?
>
> tick->nohz_stopped is merily indicating that we switched from periodic mode to
> tickless mode. That's probably a misnomer, but it still has that meaning.
>
> You really need to look at it from the history of that code which was designed
> for tickless idle. The nohz full stuff was bolted on it.
>
> So if we stop the tick in idle - or for that matter in full nohz - we look
> ahead when the next tick is required. That can be:
>
> - a timer wheel timer expiring
>
> - other stuff which prevents the cpu from going "tickless" like rcu,
> irqwork
>
> So lets assume rcu and irqwork are silent, but we have a timer expiring 100ms
> from now, then we program the tick timer to 100ms from now. When it fires it
> invokes the normal tick_sched() timer machinery:
>
> - timekeeping update
> - update_process_times
> - profile_tick
>
> I have no idea why that is a problem. If update_process_times() is invoked
> then it will account the elapsed time to the idle task in case of tickless
> idle. In case of nohz full it should simply account the time to the task which
> was busy on the cpu in user space.
>
> The above changelog is just crap and doesnt make any sense at all. And the
> patch is fixing symptoms not the root cause.

So the other way to fix this is to account properly the tickless load and avoid
to account some newly awoken task load. We could record the weighted_cpuload()
on nohz entry (or 0 in the case of idle) and then account that on idle exit.

I think it's a better solution.