Re: vmstat: On demand vmstat workers V4

From: Thomas Gleixner
Date: Sat May 10 2014 - 08:20:44 EST


On Fri, 9 May 2014, Paul E. McKenney wrote:

> On Sat, May 10, 2014 at 12:57:15AM +0200, Thomas Gleixner wrote:
> > On Fri, 9 May 2014, Christoph Lameter wrote:
> > > On Fri, 9 May 2014, Thomas Gleixner wrote:
> > > > I understand why you want to get this done by a housekeeper, I just
> > > > did not understand why we need this whole move it around business is
> > > > required.
> > >
> > > This came about because of another objection against having it simply
> > > fixed to a processor. After all that processor may be disabled etc etc.
> >
> > I really regret that I did not pay more attention (though my cycle
> > constraints simply do not allow it).
>
> As far as I can see, the NO_HZ_FULL timekeeping CPU is always zero. If it
> can change in NO_HZ_FULL kernels, RCU will do some very strange things!

Good. I seriously hope it stays that way.

> One possible issue here is that Christoph's patch is unconditional.
> It takes effect for both NO_HZ_FULL and !NO_HZ_FULL. If I recall
> correctly, the timekeeping CPU -can- change in !NO_HZ_FULL kernels,
> which might be what Christoph was trying to take into account.

Ok. Sorry, I was just in a lousy mood after wasting half a day in
reviewing even lousier patches related to that NO_HZ* muck.

So, right with NO_HZ_IDLE the time keeper can move around and
housekeeping stuff might want to move around as well.

But it's not necessary a good idea to bundle that with the timekeeper,
as under certain conditions the timekeeper duty can move around fast
and left unassigned again when the system is fully idle.

And we really do not want a gazillion of sites which implement a
metric ton of different ways to connect some random housekeeping jobs
with the timekeeper.

So the proper solution to this is to have either a thread or a
dedicated housekeeping worker, which is placed by the scheduler
depending on the system configuration and workload.

That way it can be kept at cpu0 for the nohz=off and the nohz_full
case. In the nohz_idle case we can have different placement
algorithms. On a big/little ARM machine you probably want to keep it
on the first cpu of one or the other cluster. And there might be other
constraints on servers.

So we are way better of with a generic facility, where the various
housekeeping jobs can be queued.

Does that make sense?

Thanks,

tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/