Future of NOHZ full/isolation development (was Re: [NOHZ] Remove scheduler_tick_max_deferment)

From: Frederic Weisbecker
Date: Tue Nov 11 2014 - 12:15:41 EST


On Mon, Nov 10, 2014 at 12:26:51PM -0600, Christoph Lameter wrote:
> >
> > Would it make sense for unlimited max deferment to be available as
> > a boot parameter? That would allow people who want tick-free execution
> > more than accurate stats to get that easily, while keeping stats accurate
> > for everyone else.
>
> Subject: Make the maximum tick deferral for CONFIG_NO_HZ configurable
>
> Add a way to configure this interval at boot and via
> /proc/sys/vm/max_defer_tick
>
> Signed-off-by: Christoph Lameter <cl@xxxxxxxxx>

Sorry but that's not solving the problem. All it does is to allow the user
to tune bugs.

Kevin Hilman proposed something similar using debugfs and I declined it as
well. Integrating a hack like this is a good way to make sure that nobody
will ever fix the real underlying issue.

BTW, that's a good opportunity for me to generalize this case to the full
dynticks development general issue. I got a lot of help from people to improve
the kernel's isolation and full dynticks: Paul has spent a lot of time to improve
RCU, you improved vmstat, full dynticks got ported to other archs, people
like Viresh fixed some timers internals, Gilad fixed IPIs, Peterz reviewed a
lot, etc...

But now we reached a step where there are mostly core issues remaining that
require some infrastrure change investments, some extensions or a bit of rethinking.
We know we reach that step when people who want the features are stuck sending
workarounds.
Nothing like big rewrites is needed really, actually just a bunch of pretty
self contained issues. And by self-contained I mean that each of these individual
problems can be worked out seperately as they are unrelated enough altogether. Here is
a summarized list:

* Unbound workqueues affinity (to housekeeper)
* Unbound timers affinity (to housekeeper)
* 1 Hz residual scheduler tick offlining to housekeeper
* Fix some scheduler accounting that don't even work with 1 Hz: cpu load
accounting, rt_scale, load balancing, etc...
* Lighten the syscall path and get rid of cputime accounting + RCU hooks
for people who want isolation + fast syscalls and faults.
* Work on non-affinable workqueues
* Work on non-affinable timers
* ...

If I'm going to work alone on all that, this is going to take several years,
honestly.

But we know what to do and how. So all we need is (at least one) more full time
core developer to get these things done in a reasonable amount of time.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/