Re: a different approach to scheduling issues

Jeremy Fitzhardinge (jsgf@sirius.com)
Sat, 03 Oct 1998 11:34:57 -0700 (PDT)


On 02-Oct-98 Rik van Riel wrote:
> There's only one catch with this. Processes are put
> on the runqueue so often that the recalculation of
> running processes is almost negligable. Every letter
> I typed goes through X, xterm, then to pine and after
> that it goes back up again. This will have 3 processes
> put on and removed from the runqueue, 2 of 'em 2 times.

Sure, but put it this way: at present, the scheduler traverses the
entire run-queue every context switch to find the next best process.
Adding to the run-queue is a constant time operation.

What I'm proposing makes the context switch a constant time operation
of trivial complexity. On the other hand, adding to the run queue takes
some effort to keep it sorted. My argument is that this is more efficient
in the long run because:

- things enter and leave the run queue less often than context switches
(not a lot less often with mostly interactive loads, but less when
there's CPU bound activity)

- because something is going onto the run-queue, it has been sleeping for
a while. Schedulers tend to jack up the priority of sleeping processes
to improve interactive response. Since the new process on the run queue
has been sleeping and the others haven't (by definition), it will tend
to have a higher priority than the encumbants, meaning that it will go
near the head of the run-queue. The end result is that, in practice,
the performance will be very similar to the current constant-time scheme.

The only downside with this is that without making the run-queue locking
considerably more complex, the time to add a process to a run-queue
becomes less predictable (and hence interrupt latency), which is bad
for the RT folk. But you could use udelay() to make it always take the
same time... (I'm only mostly joking).

Note that none of this *requires* that the scheduler algorithm
proper be run. This is simply the mechanism that makes sure the
scheduler's allocation of CPU proportion is obeyed. If you have a simple
scheduler like the current Linux one then you could do it every context
switch/run-queue entry, but it is not a requirement (especially if you
have mostly CPU-bound processes).

>> The latter process can be done less frequently, and calculated based
>> on the behaviour of the process since the last calculation (and
>> other factors). By "less frequently", I mean somewhere in the order
>> of .1-10 seconds, depending on what the nature of the machine's load
>> is.
>
> Determining the load and basing a decision on that data
> will be about as expensive as the not-recalculating when
> the process wakes up again in the same jiffie. It's one
> comparison and a jump in the best case. In the worst case
> it is followed by 2 assignments, and a comparison.

I'm not talking about the current scheduler. I'm not proposing that
we change the way it calculates process priorities: it works well,
and does what it's supposed to do well. I'm talking about more complex
schedulers which try to solve more complex problems. What I'm talking
about is a way of extending Linux to incorporate them without breaking
the nice properties of the way it works now.

> Besides, your "calculating the nature of the machine's load"
> might take up so long that we'd increase the worst-case
> latency to unacceptable leves.

No, you can do that sort of number-crunching out of the critical path.
Of course you don't want to spend too much time in the scheduler when you
could be doing real work, but it can certainly be worth spending more
time in the scheduler if it gives you better system-wide performance -
after all you can always speed up the CPU if you need to compensate for
a complex scheduling algorithm, but a faster CPU won't help you if the
scheduler allows one runaway process to steal the whole machine.

> It's not just about total overhead, it's also about spreading
> out all stuff in order to avoid a bad worst-case.

Yep, we agree there.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/