Re: [PATCH] scheduler patch, faster still

yodaiken@chelm.cs.nmt.edu
Sun, 27 Sep 1998 20:00:35 -0600


On Mon, Sep 28, 1998 at 11:04:19AM +1000, Richard Gooch wrote:

> > Richard points out a strange result shown by his benchmark. He
> > _guesses_ that this result implies that (A) there is a performance
> > problem for real applications and (B) this problem may be solved by
> > a second queue. But what is the evidence for either? Note that a
> > queue of length 10 is trivial, a queue of length 100 is most likely
> > trivial on a fast processor. How about a benchmark with an
> > instrumented kernel showing time spent searching the queue before we
> > optimize time searching the queue? That is, you should be able to
> > show that for Application X, wall clock time of T involves T_s time
> > searching the queue where T_s is a substantial fraction of T.
>
> A: I've explained previously that we have latency sensitive RT tasks
> sitting in a tight read-block/do-something-simple/write/read-again
> loop.

What's the application? How generic is the problem? Does increasing
HZ solve the problem? Why not?

>
> B: The evidence for a second run queue solving the problem is very
> strong. The numbers I've posted show beyond any doubt that an
> increased run queue length has a measurable effect on context switch
> latency. I've posted figures showing the per-process cost of extra
> entries on the run queue.
> My published results show that a queue length of 10 is in fact not
> trivial. A queue length of 100 is not trivial even on a large
> processor. A queue length of 3 (extra) is already significant on a
> Pentium 100: the thread switch time is doubled. 10 extra processes
> (possible with one of our applications) is horrendous.

I must have missed these results. What I saw were microbenchmark results
with no kernel profiling and no explanation. Why should searching a
queue of 10 on 586 take any significant time? Where is the time spent?

Do you have a result with an instrumented schedule()? If not, why not?

>
> The third issue you raised above (T_s is significant compared to T) is
> covered by my response to (A).

No. Doubling thread switch time is not evidence that T_s is significant
in T.

> > As it is, we have no solid evidence that there is anything wrong
> > with the scheduler. In fact, we have DaveM's experience (not to
> > mention Cort's experience with PPC Linux) to show that hammering out
> > a few microseconds on schedule time seems to have NO effect on
> > system performance.
>
> I'm not talking about overall system performance. I'm talking about
> the effect of increased latency on latency-sensitive RT
> applications. This is a different world.

I'm very skeptical of loose use of the term RT. Are these tasks of yours
really RT? Hard? Soft? Statistical?

in any case, compare worst case non-scheduler induced latency with scheduler
induced latency. For example, process N enters sys_fork as process R
becomes runnable and then there are a series of not interesting/critical
interrupts. How long till check needs_resched? If the longest
path between _calls_ to schedule() is significantly longer than
the longest measured time for schedule() to complete, then you are
really barking up the wrong tree

Your previous comment leads me to believe that these "RT" processes
make use of semaphores. Now add worst case semaphore induced delay
to the reuslt obtained above and I would be really shocked to find
out that scheduler induced latency is even visible

>
> Also, a separate RT run queue would make the scheduler code simpler
> and more robust.

How? It's quite simple as it is.
>
> > It follows that those who really want to contribute to Linux
> > performance might consult Alan's list of 2.2 showstoppers or find
> > some other actual problem. I have a couple of scheduler related
> > problems for RTLinux that I would love someone to solve. For
> > example, it should be easy to write a SMP scheduler that would evict
> > all non-rt tasks from a particular processor after at most N timer
> > interrupts where N is the number of Linux processes assigned to that
> > CPU. Or maybe someone could rewrite our RT-fifo module for better
> > performance. Or how about a solid big physarea? ..
>
> Go back to an early post of mine where I suggest a separate RT run
> queue as a possible 2.3 project. I didn't say "let's do this for 2.2".
>
> I would rather spend my efforts improving standard Linux than
> RT-Linux. No offence intended.

None taken. To appreciate RTLinux, you need to understand why it's
good not to put things into the standard kernel.

---------------------------------
Victor Yodaiken
Department of Computer Science
New Mexico Institute of Mining and Technology
Socorro NM 87801
Homepage http://www.cs.nmt.edu/~yodaiken
PowerPC Linux page http://linuxppc.cs.nmt.edu
Real-Time Page http://rtlinux.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/