Re: On optimising the scheduler for large run queues

From: Marco Colombo (marco@esi.it)
Date: Mon Jan 31 2000 - 09:22:50 EST

Next message: Jeff Garzik: "Re: 2.3.41 and TNT2 fbcon"
Previous message: David Woodhouse: "Re: Updated 2.3.x job list (its getting shorter)"
In reply to: Jamie Lokier: "On optimising the scheduler for large run queues"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, 29 Jan 2000, Jamie Lokier wrote:

> I will summarise.
>
> Heavyweight kernel developers (no that doesn't mean >40 years old :-)
> believe that, for all well designed real applications, scheduler
> overhead is dominated by cache overhead. Therefore optimising cache
> overhead takes priority.
>
> I agree. Though I personally do not have figures to support it, I have
> the impression that others do.

I can't call myself a "kernel developer", and I don't have figures.
But i believe it anyway [someone told he measured it, and also gave
good explanations].

[...]
> Multi-threaded I/O can be reduced to select() I/O.
>
> This isn't true for Java at the application level. You might call it a
> flaw in Java itself, and it probably is for now[1]. But there isn't a
> better alternative yet. Rewriting Java "business objects" in C is not
> plausible. User space threading is often a better idea in Java. If
^^^^^^^^^
That's a Java problem. Not a Linux one.

> they worked perfectly run queues would not be an issue, but user space
> threads introduce their own overheads.[2] Maybe dynamic compilers
> should try dynamically optimising thread switches outside the kernel.
>
> Real applications that are fully optimised for performance do not have
> large run queues.
>
> Apparently this is true.

They have many "good" properties. A good cache behaviour is one of the
most important. Avoiding unnecessary (kernel) schedules is another.
Both are difficult to obtain.

> Therefore optimising the scheduler for large run queues, at a cost for
> small run queues (in maintenance, footprint and overhead) is
> counterproductive for the most critical cases.
>
> Here is a logical reasoning error.[3] By the kernel heavyweights.
>
> The overhead of a scheduler changes is *only* relevant to high switching
> rate applications. And that doesn't include purely select() based
> single-threaded monolithic servers.
>
> No-one has shown any real, well designed, well tuned, short run queue
> applications that have a high switching rate!
>
> They certainly haven't used such examples in their arguments! They've
> used other examples. That's why it's a reasoning error: Those examples
> aren't relevant!

Good point. It's true for me, at last.

> Now, I expect there are examples of real, well designed, well tuned
> applications that switch very often.
>
> Until someone demonstrates that *those* applications have small run
> queues, and only those, then we have to consider the large run queue
> patches seriously. Remember the other applications, including

I'm sorry, Jamie, but this is *your* reasoning error. You should
demonstrate that real, well designed, well tuned applications
that switch very often (provided they exist: they lack one of the
"good" properties, low swiching rate, so they may be not "so well
designed and well tuned") also have long RQs.

In other words, I won't call an application that has both high
switch rate and causes a long RQ "well designed, well tuned".
I can hardly think of such an application which has a good cache
behaviour at the same time (that's my impression).

So I think that an application that has both high switch rate and
long RQ is NOT "well designed, well tuned", and you should optimize it.

[...]

> Dear experienced kernel developers, please give examples of real world
> application, properly written for performance (as you like), that would
> be adversely affected by the proposed scheduler changes.

That's a good point.

But:
you failed to show any real world application, properly written for
performance (as we ALL like), that would take advantage of the proposed
scheduler change.

If you show me a bad written one, I can do the same. B-)

Unless someone more experienced than me has something to say about it,
the point now is:

- we can't show any good application that will take advantage of the patch;
- we can't show any good application that will be adversely affected by it.

So, right now, IMHO there are no reasons to consider the patch...

[ this leaves out >16 CPUs systems... it should be considered for them,
if numbers show some real improvement for RQ around 20 ]

.TM.

-- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Jeff Garzik: "Re: 2.3.41 and TNT2 fbcon"
Previous message: David Woodhouse: "Re: Updated 2.3.x job list (its getting shorter)"
In reply to: Jamie Lokier: "On optimising the scheduler for large run queues"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:28 EST