Re: Auto-Adaptive scheduler and semaphore patch ( 2.2.14 ) ...

From: David Lang (dlang@diginsite.com)
Date: Tue Jan 25 2000 - 15:15:50 EST

Next message: Mark Steele: "Memory/cpu problem with Asus K7M motherboard/Athlon 750"
Previous message: Giuliano Pochini: "Re: Unable to reread SCSI optical disks"
In reply to: Davide Libenzi: "Re: Auto-Adaptive scheduler and semaphore patch ( 2.2.14 ) ..."
Next in thread: Davide Libenzi: "Re: Auto-Adaptive scheduler and semaphore patch ( 2.2.14 ) ..."
Reply: Davide Libenzi: "Re: Auto-Adaptive scheduler and semaphore patch ( 2.2.14 ) ..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Davide, your patch did a couple of things, one of which was to improve the
cache hit rate by re-ordering things. would it be possible to seperate
that portion of the patch out so it does not skew the results?

for those who are saying that this doesn't match any real-world
situations, if this really does give a benifit aroung 4-5 processes in the
runqueue then it may be a valid selection for a 4 or 8 way SMP box. I
would not want to use on on a UP box or a 2 way SMP box, but if it becomes
a compile time option for large servers it may be reasonable.

David Lang

On Tue, 25 Jan 2000, Davide Libenzi wrote:

> Date: Tue, 25 Jan 2000 16:18:29 +0100
> From: Davide Libenzi <davidel@maticad.it>
> To: Ed Tomlinson <ejt@sympatico.ca>, Larry McVoy <lm@bitmover.com>
> Cc: linux-kernel@vger.rutgers.edu
> Subject: Re: Auto-Adaptive scheduler and semaphore patch ( 2.2.14 ) ...
>
> Tuesday, January 25, 2000 5:00 AM
> Larry McVoy <lm@bitmover.com> wrote :
> > :
> > : >From my point of view this is a winner. What do other people's numbers
> > : look like?
>
> > Umm, looking at your vmstat numbers, which say you are doing 88,000
> > context switches/sec, or in other words, if the context switch itself
> > took 0 time, each process is running all of 11.4 usecs before switching.
> > Given that you are running loads of processes, it's not unreasonable to
> > think that the context switch itself is taking 100% of that 11.4 usecs.
>
> Larry we're testing switching times, what the benchmarks should do,
> writing to serial ?
>
> The _very_ interesting points in Ed benchmarks are :
>
> 1) We get 10% increase with only 4-5 tasks in RQ
> 2) The number on processes in RQ decrease with the patch installed.
> This _means_ that the semaphore release patch give a faster exit way to
> waiting tasks ( avoid flushing all threads into RQ, so it gets a shorter
> one ).
>
>
> > How many times does this need to be said? This is an irrelevant
> > data point. I don't care if you see a 100% improvement in this case.
> > It's a stupid test case. Under no circumstances would I trade off even
> > a 1% improvement in this case for some much as one more cache miss in
> > the scheduler. It's just silly.
> > This game is like talking about how much smoother you can make your
> > engine run at 110,000 RPM, while merrily ignoring that (a) you never
> > run your engine at 110,000 RPM, and (b) the supposed improvemnt gives
> > you worse gas mileage at 4,000 RPM, where you spend all your time.
>
> Larry You've asked numbers and these are numbers, aren't they ?
> As I've said into the first annouce on the new scheduler, it uses
> the linear version ( current one ) for low workloads and switch to
> cluster version under high workloads.
> The entry point and the delay at which to new scheduler switch on/off
> can be made software tunable via proc fs.
> In this way one can tune the scheduler behaviour to adapt it to the
> applications ( and charge ) running on his system.
>
> More, I want to spend a word about the importance of a cache misses into
> the scheduler with a 1-2 tasks in its RQ.
> We can state that the number of processes into the RQ is proportional to
> the number of processes running onto the system with a factor K which
> depend on the applications behaviour.
> A system with 20 processes in RQ will probably have 10 times the processes
> of a system with 2 processes in RQ ( given K costant for the two systems ).
> These 10 times of processes, if we exclude are doing "for (;;);", are
> probably request some form of interaction with devices plugged onto the
> system.
> The method the devices has to advise the system of an IO completion is IRQs
> that fires interrups that fires schedule().
> Think about at the interrupts a SCSI card ( which can handle more than one
> request at a time ), or a network ( systems can have more than one ) card
> can
> fire in a second when hurt by a lot of requests.
> So 10 times of more processes will generate ( even if not 10x ) a lot of
> more
> interrupts ( read schedule() ) than the low charged one.
> So giving the possibility to the scheduler to fast resolve the next process
> election, will result in a better system behaviour and in a lower interrupt
> latency.
>
> So definitely one cache misses on a system with an RQ = 1 or 2 is _not_ a
> big deal.
> So definitely a longer schedule() exit path on heavy loaded system _is_ a
> big deal.
>
> If You've seen the patch You've noted the way in which it performs and
> works,
> so if You want to speak about the code, the patch is there and if You to
> speak
> about the numbers, the numbers are here.
>
>
> I've also in mind another change that can, IMVHO, speed up even more the
> cluster
> scheduler().
> Now the counter member of the task_struct is charged ( for normal priority )
> with a value of 20.
> When a task has exhausted its counter it'll wait until all the processes in
> RQ
> has exhausted its own and then a recharge loop is performed.
> Now this value is kept low to avoid processes waiting at 0 goodness to
> station
> more than a bit in an inactive state.
> This result in an relatively high recharge loops rate with all RQ tasks
> restarting at
> the same level ( for equal priority ) and fall down along this short path.
> This kind of behaviour limit the performance the cluster scheduler can have
> coz
> it concentrates all tasks in a short interval of goodness ( similar tasks
> can
> have the same fall down and therefore share the same slot ).
> The cluster scheduler has the cluster ( slot ) 0 populated by exhausted
> processes
> that, in this implementation are waiting for a charge.
> My idea is to lengthen the counter charge value ( _not_ the time slice ) and
> implement a single process recharge ( in a FIFO way ) for tasks in cluster
> 0.
> This can lead to a more fine grained distribution along clusters as long as
> a better ( or equal ) tasks responsiveness.
>
> What do You think about ?
>
>
>
> With respect,
> Davide.
>
> --
> All this stuff is IMVHO
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mark Steele: "Memory/cpu problem with Asus K7M motherboard/Athlon 750"
Previous message: Giuliano Pochini: "Re: Unable to reread SCSI optical disks"
In reply to: Davide Libenzi: "Re: Auto-Adaptive scheduler and semaphore patch ( 2.2.14 ) ..."
Next in thread: Davide Libenzi: "Re: Auto-Adaptive scheduler and semaphore patch ( 2.2.14 ) ..."
Reply: Davide Libenzi: "Re: Auto-Adaptive scheduler and semaphore patch ( 2.2.14 ) ..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:15 EST