Re: An elevator algorithm (patch)

From: Peter Osterlund (peter.osterlund@mailbox.swipnet.se)
Date: Sun Sep 17 2000 - 13:57:00 EST


Andrea Arcangeli <andrea@suse.de> writes:

> The patch is buggy for non headactive devices like SCSI and also for
> IDE while the queue is plugged.

Thanks for looking at the patch. Yes, sorry, I misunderstood the linux
linked list implementation. This is easy to fix though. See new patch
at the end.

> While the queue is plugged or with things like SCSI your logic
> change won't work because in such case if your request is lower the
> lowest in the queue, you can put it at the head of the queue and you
> have no way to know where your "tmp1" was placed so you can't make
> any assumption (that's why the current code makes sense).

I still don't think the current code makes sense. Consider what will
happen if you insert request 100, 200, 101, 201, 102, 102, ... in an
empty plugged queue. (Assume max allowed latency is 2)

queue: 100 101 102 200 103 201 104 202 105 203 204 205 ...
latency: 2 2 2 0 2 0 2 0 2 0 1 2 ...

So you end up with one seek between each request. The same thing
happens no matter how large the latency is. It just takes longer
before the behaviour starts. With the new logic you will instead get:

queue: 100 101 102 200 201 202 203 204 103 104 105 205 ...
latency: 2 2 2 0 1 2 2 2 0 1 2 2 ...

So in this case there will be 5 (2*latency+1) requests between each
seek.

The new patch will obey the latency requirement but still keep disk
seeks to a minimum. This makes it possible to use much smaller latency
values without losing streaming performance. (I used "elvtune -r 10000
-w 200000" and didn't notice any slowdown in iozone and Bonnie. It is
still faster than plain 2.4.0-test8)

The new patch is also not unfair to requests near the end of the disk.
The current kernel code can starve those requests a very long time if
the request queue never becomes empty.

> But I had an idea to generalize the algorithm so that we could
> optimize SCSI as well and also IDE in the plugged case: nobody
> forbid us to remeber the last position of the drive in the
> request_queue_t to still be able to know your "tmp1" that actually
> we don't know about.

Yes, that would be even better, but IMHO the new patch is already
better than what's currently in the kernel. (I don't think the
elevator code before your latency fix handled this correctly either.)

The only disadvantage I can see is that the new patch doesn't handle
consecutive insertions in O(1) time, but then again, the pre-latency
elevator code didn't do that either. Is this really important? How
long can the request queue be? Apparently we gain more by avoiding
disk seeks than we lose by wasting some CPU cycles during request
insertion.

Here is the new patch:

--- linux-2.4.0-test8/drivers/block/elevator.c.orig Sun Sep 17 00:05:03 2000
+++ linux-2.4.0-test8/drivers/block/elevator.c Sun Sep 17 20:27:48 2000
@@ -34,20 +34,55 @@
                     struct list_head *real_head,
                     struct list_head *head, int orig_latency)
 {
- struct list_head *entry = real_head;
- struct request *tmp;
-
- req->elevator_sequence = orig_latency;
-
- while ((entry = entry->prev) != head) {
- tmp = blkdev_entry_to_request(entry);
- if (IN_ORDER(tmp, req))
- break;
- if (!tmp->elevator_sequence)
- break;
- tmp->elevator_sequence--;
- }
- list_add(&req->queue, entry);
+ struct list_head *last = real_head->prev;
+ struct list_head *insert_after = last;
+ struct list_head *entry;
+ struct request *tmp1, *tmp2;
+ int do_insert;
+
+ req->elevator_sequence = orig_latency;
+
+ if (last == head)
+ goto out;
+
+ entry = last;
+ tmp1 = blkdev_entry_to_request(entry);
+ do {
+ tmp2 = tmp1;
+ entry = entry->prev;
+ tmp1 = blkdev_entry_to_request(entry);
+ if (!tmp2->elevator_sequence)
+ break;
+ do_insert = 0;
+ if (entry == real_head) {
+ /*
+ * Since we don't know were the disk head is
+ * currently located, we can not really know
+ * if the request should be inserted here. The
+ * best bet is probably not to insert the
+ * request here, because otherwise the
+ * elevator would be unfair to sectors at the
+ * end of the disk.
+ */
+ } else if (IN_ORDER(tmp1, tmp2)) {
+ if (IN_ORDER(tmp1, req) && IN_ORDER(req, tmp2))
+ do_insert = 1;
+ } else {
+ if (IN_ORDER(tmp1, req) || IN_ORDER(req, tmp2))
+ do_insert = 1;
+ }
+ if (do_insert) {
+ insert_after = entry;
+ do {
+ entry = entry->next;
+ tmp1 = blkdev_entry_to_request(entry);
+ tmp1->elevator_sequence--;
+ } while (entry != last);
+ break;
+ }
+ } while (entry != head);
+out:
+ list_add(&req->queue, insert_after);
 }
 
 int elevator_linus_merge(request_queue_t *q, struct request **req,

-- 
Peter Österlund          Email:     peter.osterlund@mailbox.swipnet.se
Sköndalsvägen 35                    f90-pos@nada.kth.se
S-128 66 Sköndal         Home page: http://home1.swipnet.se/~w-15919
Sweden                   Phone:     +46 8 942647

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 21:00:15 EST