Re: Interesting analysis of linux kernel threading by IBM

From: Marco Colombo (marco@esi.it)
Date: Mon Jan 24 2000 - 16:01:14 EST


[Davide Libenzi]
> Sure I'll do bechmarks even if we've confused scheduler changes for application
> design.
>
> Just now I'm testing ( benchmarking ) the new version of the scheduler that use
> clusterization under high ( tunable ) loads while use the linear scan for low
> workloads.
> AFAIS it has the performance boost of the cluster-scheduler at high workloads
> as long has the same times of the linear-scheduler under low ones.

I think they really asked you to re-design your application, not the
scheduler...
if your application does it The Wrong Way (TM), you have to rewrite it to
see real performance improvements. You MAY get some improvements changing the
scheduler, but that's not the point.

BTW, there's something i must be missing:
you keep saying that every thread in your "pipe" does some highly CPU
expensive job. Really I don't understand how the schedule can help in this
case.
Consider one cpu. No matter how many runnable process there are in the
system, given a time-slice of 200ms, it runs 5 processes/second. So 5
scheduler invocation/second. On a 2-way system, that's 10 invocations per
second. And I doubt, even with a very long run-queue, the scheduler takes
any noticeable fraction of a 1/10 second to run.

And if one of your threads runs for just 1ms and then reads new data
(invalidating the cache), that's NOT a CPU intensive computation, since
it takes only 1/200 of the available time...

> OK to stop this thread.

Fell free to answer me privately, then...

this message is public because i'm really missing something about the
whole thread. The only case i think the scheduler impacts on performance
is when it's invoked *a lot* of times. This means that processes (or
threads, whatever) are NOT CPU-bound (they are not preempted). If not
being CPU-bound means being I/O-bound (most of the times), we have
short run queues (processes are waiting for I/O, not for the CPU).
In some rare cases a process can be neither CPU nor I/O bound. There are
times when a process releases the CPU but does not sleep for I/O. If
it keeps doing that, and if there are many of such a process running at
the same time, the scheduler gets invoked a lot of times.

If i understand well, they're saying that if you run into such a condition,
on a SMP system you'll get a lot of cache misses. Your processes
are, in a way, "RAM-bound" (that's mine, of course). And RAM, compared to
cache, nowadays is as much slower as I/O is compared to RAM. So, RAM
becomes a bottleneck.

Suppose that under that kind of load, the CPU spends 20% of the time
in the scheduler and the rest trying to serve your threads (but with the
performances of an *uncached* RAM). You may reduce that scheduler time to
even 5% or 10%, getting some kind of improvement (+15% ? I'm not doing
the calculation B-)) on your application overall. But you will always
suffer of the (-90% ?) penalty of not makeing correct use of the cache.

That's way Larry McVoy asked you to re-implement your application
(in a cache friendly way), and THEN do comparative benchmarking
(Larry: please correct me if I'm wrong).

.TM.

-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:13 EST