Re: elevator-starvation-4 (2.2.14 && 2.3.42)

From: Bruce Thompson (bruce@otherother.com)
Date: Thu Feb 10 2000 - 09:27:50 EST


Hi,
     Maybe I'm missing something massive, but I'd always been under
the impression that an elevator algorithm made starvation impossible?
I guess it's time for me to dig into the sources, but here's what I'd
always thought of as a basic elevator algorithm:
     - Track current request location (for simplicity, let's say the
current cylinder)
     - Keep two sorted request queues: current and next
     - When a request comes in, if it's cylinder is strictly greater
than the current cylinder then insert it into the current request
queue, otherwise insert it into the next request queue.
     - When the current request has been satisfied, take the next
request off the current queue and set the current location to the
request's location. Handle the request.
     - When the current request queue is empty, swap the current queue
for the next queue.
     - Lather, Rinse, Repeat.

This cannot lead to starvation since the "next" queue will eventually
become the current queue. It's not optimised either, but the most
obvious optimisation is to make it by-directional so that when the
current request location reaches one extreme it reverses direction
and pulls requests from the current queue in reverse order. The
bi-directional elevator optomises for requests in the middle of the
platter, but it still cannot starve requests near the edges.

That's the basic algorithm I learned in an OS course from my
undergrad degree nearly 15 years ago, have I missed something major?

        Cheers,
        Bruce.

Original Message:

From: Linus Torvalds <torvalds@transmeta.com>
Date: Wed, 9 Feb 2000 12:19:06 -0800 (PST)
Subject: Re: elevator-starvation-4 (2.2.14 && 2.3.42) [was Re: 2.3.42
elevator latency] (fwd)

On Wed, 9 Feb 2000, Andrea Arcangeli wrote:
>
> Linus I suggest you to give a try to the below patch. Do you remeber that
> linux always stalled during heavy write I/O? If not try a `cp /dev/zero .`
> and see that all accesses to the fs where you are writing to stalls
> sometime until you kill `cp` on the other terminal.

I'd MUCH rather have something like:
  - each IO queue has a sequence number
  - each incoming request increments the sequence number, and sets req->seq
    to be the new sequence number allocated.
  - re-do the request queue to be a regular "struct list_head" thing, so
    that we can go both forwards and backwards.
  - start adding requests from the BACK instead of the front like we do
    now. That's usually the right thing to do anyway, so it makes us use
    less CPU to find the right position. It also makes the next rule
    trivial to implement:
  - refuse to move a new request forward past a request that has a sequence
    number that is too much in the past. Here "too much" depends on what
    kinds of requests we're talking about.

I don't like the "writebomb" logic - rather than have a separate writebomb
thing, it should be much easier to make the "too much in the past" check
do this particular logic. So the logic may be something like

  - writes may occur earlier than reads, but we will do that ONLY if
        - the read is really recent (ie the distance between the "current
           sequence number" and the "read request sequence number" is
          short)
        - the write is closer to the proper elevator sequence than the
          read was.

Reads work the same way, except the "distance" requirement can be much
less strict - let's say that writes can pass reads only if the read is
within the last 10 requests handled, while reads can pass other reads as
long as there have been less than 100 other reads in between (made-up
numbers, you get the idea).

Passing old writes is even easier, so there the distance could be
something like "it's ok to pass an old write as long as the old writes
sequence number is within 1000 of the current one". This is also where we
could easily have "generation of write" logic for sorting between two
writes - to force a partial ordering on the queue level.

So I think the sequence numbers should be able to handle =both= the
latency issue and the write bomb issue. With some simple rules like the
above, you KNOW that you'll never starve a readfrom writes, in fact you'll
be guaranteed to do the read with no more than X (in above example 10)
writes coming between it and execution.

Comments? It doesn't seem to be too hard to do, and I'd hate to apply your
current patch that does something similar but has other things I disagree
with.

                Linus

Please read the FAQ at http://www.tux.org/lkml/

-- 
--
------------------------------------------------------------------------
Bruce Thompson                  | "Pinky! Are you pondering what I'm
Palm Computing, Inc.            |  pondering?"
                                 | "Uh, I think so Brain, but where will
                                 |  we find a duck and a hose at this
bruce@otherother.com            |  hour?"
                    My opinions are strictly my own.

PGP Fingerprint: 8F48 7FEF EE22 14FB 1F2E 3BF2 0D40 9628 53E8 72EB

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Feb 15 2000 - 21:00:17 EST