Re: dynamic sched timeslices

From: Kurt Garloff
Date: Tue Mar 16 2004 - 10:16:40 EST


Hi Con,

On Wed, Mar 17, 2004 at 12:13:37AM +1100, Con Kolivas wrote:
> 2.4 O(1) effects do not directly apply with 2.6
>
> Dropping Hz will save you performance for sure on 2.6.
>
> Changing the timeslices in 2.6 will be disappointing, though. Although the
> apparent timeslice of nice 0 tasks is 102ms, interactive tasks round robin at
> 10ms. If you drop the timeslice to 10ms you will not improve the interactive
> feel but you will speed up expiration instead which will almost certainly
> worsen interactive feel.

If you have a system with an easy workload (say one clear CPU hog and
one interactive job), things are easy. The fact that you preempt the
not-yet expired CPU hog is enough.
That's easy, and that worked with 2.4 O(1) (if tweaked a bit to estimate
interactiveness better, see other patch) and it works with 2.6.

Things start to get difficult if you have something like a calculation
program with a non-multithreaded GUI. It will look like a CPU hog and
still you'd like to see it responsive. Now add a second CPU hog.

The kernel can not fix this problem, but it can limit the damage by
not having too long timeslices.

There are other scenarios where the preemption will not solve all
problems.
Think two interactive processes, one playing audio, another one being
your shell. The audio player may take the CPU for extended periods of
times occasionally to decode the next N ogg frames. You still want the
shell to react promptly, but it can't ... Thus you wish the timeslice
not being too long.

Thus you'll set them not too long for desktop kind of machines to
not have to rely completely on the interactiveness estimator.

> If you drop timeslices below 10ms you will get
> significant cache trashing and drop in performance (which your 2.4 results
> confirm).

No doubt. Don't overdo it. It's a tradeoff. If you impact throughput too
much, you'll not enjoy the short latency ;-)

> Increasing timeslices does benefit pure number crunching workloads. The
> benchmarking I've done using cache intensive workloads (which are the most
> likely to benefit) show you are chasing diminishing returns, though. You can
> mathematically model them based on the fact that keeping a task bound to a
> cpu instead of shifting it to another cpu on SMP saves about 2ms processing
> time on P4. Suffice to say the benefit is only worth it if you do nothing but
> cpu intensive things, and becomes virtually insignificant beyond 200ms. On
> other architecture with longer cache decays you will benefit more;
> arch/i386/mach-voyager seems the longest at 20ms.

That's why I think we should offer the tunables.

Regards,
--
Kurt Garloff <kurt@xxxxxxxxxx> [Koeln, DE]
Physics:Plasma modeling <garloff@xxxxxxxxxxxxxxxxxxx> [TU Eindhoven, NL]
Linux: SUSE Labs (Head) <garloff@xxxxxxx> [SUSE Nuernberg, DE]

Attachment: pgp00000.pgp
Description: PGP signature