Re: elevator-starvation-4 (2.2.14 && 2.3.42) [was Re: 2.3.42 elevator latency] (fwd)

From: Andrea Arcangeli (andrea@suse.de)
Date: Wed Feb 09 2000 - 19:45:34 EST


On Wed, 9 Feb 2000, Linus Torvalds wrote:

>I'd MUCH rather have something like:
> - each IO queue has a sequence number

I am just doing this today.

And using sequence number is been the first thing I tried to do when I
noticed the problem. On the reiserfs mailing list there is a long thread
with a detailed description of all my early ideas btw.

Unfortunately 2.2.x sucks and I can't put a sequence number per-request
queue (at first I put it into the blk_dev structure but then I noticed it
was not correct indeed since it has to be per queue).

As yestarday I told to Stephen it was in my TODO list and I am doing this
conversion now for 2.3.x where I can do that even without breaking the
lowlevel drivers! ;)

> - re-do the request queue to be a regular "struct list_head" thing, so
> that we can go both forwards and backwards.

I started killing the next pointer in my early research until I noticed I
was breaking lowlevel drivers (ide scsi at least) and so I started
pressing C^_ and I given up. I certainly can't do that in 2.2.x and I
wanted to do that for 2.3.x but I guessed it was too late but since you
even suggest me to break the drivers that's fine ;).

> - start adding requests from the BACK instead of the front like we do
> now. That's usually the right thing to do anyway, so it makes us use
> less CPU to find the right position. It also makes the next rule
> trivial to implement:

That's one of the reasons I wanted a double linked list. It would make the
current code more efficient anyway since in 2.2.x we are just scanning the
whole list to queue at the end of the list (insert in O(n) (where at least
n is 127) instead of O(1) thus it's not a news there should be a BACK
pointer, it's not because of my needs that should be fixed).

> - refuse to move a new request forward past a request that has a sequence
> number that is too much in the past. Here "too much" depends on what
> kinds of requests we're talking about.

Exactly like now. Just without scanning the whole list and without
changing the all the requests at each inserction.

>I don't like the "writebomb" logic - rather than have a separate writebomb
>thing, it should be much easier to make the "too much in the past" check
>do this particular logic. So the logic may be something like

The writebomb logic is that you can have a request large 128 segments (and
even more than 128 bh if the physical addresses of the memory of two
contigous req is contigous). Yes it's a bit slow HD since it writes one
req per sec:

        bomb under processing wait 128 seconds -> bomb -> bomb -> NULL

Now I want to run an urgent request ASAP:

        bomb under processing wait 128 seconds -> urgent req -> bomb -> bomb -> NULL

Now the urgent req will have to wait 128 seconds even while it has more
priority of the previosu 127 requests that are been merged in the bomb
under processing by the IDE layer while it should wait only 1 second.

My early patches was simply forbidding merging if a read was pending (this
harmed a bit the rewrite performance infact). The new code instead allow
each queue to enforce a maximal coalescing limit of write-segments if
there are read pendings.

With the early code the urgent request would wait really only 1 sec. With
the new code (-4 revision) it waits 4 seconds (way better than 128
seconds) and the rewrite performance should be better now.

Without the writebomb logic read latency is always bad.

> - writes may occur earlier than reads, but we will do that ONLY if

The writebomb logic has anything to do with ordering. It's only to avoid
having the lowlevel layer stalled for a too long time in a single not
breakable request under processing.

Writing the uncoalescing code looks a pain to me.

>So I think the sequence numbers should be able to handle =both= the
>latency issue and the write bomb issue. With some simple rules like the
>above, you KNOW that you'll never starve a readfrom writes, in fact you'll
>be guaranteed to do the read with no more than X (in above example 10)
>writes coming between it and execution.

That's how my code works just now indeed. The write bomb logic is another
thing.

>Comments? It doesn't seem to be too hard to do, and I'd hate to apply your
>current patch that does something similar but has other things I disagree
>with.

As far I can tell you disagree on the implementation that is the way best
I could do in 2.2.x. Actually I developed the code for 2.2.x since the
write hang is been reported to me as a bug in 2.2.x thus my primary object
was to fix the production kernel since the bug is a showstopper. Now that
I did the way best for 2.2.x I'll try to do the way best for 2.3.x
starting from the working point I reach in 2.2.x. Also I am completly
satisfyed by driving the I/O layer in the way I am just doing now, thus
I'll only change the complexity of the implementation now to allow it to
do the calculation faster. Fixing up all the drivers won't be a few hours
work (how it's instead replacing the dirtyfing with a sequence number) so
stay tuned and you'll get the -5 revision in a few days instead.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Feb 15 2000 - 21:00:16 EST