Re: Asynch I/O gets easily overloaded on 2.2.15 and 2.3.99

From: Andrea Arcangeli (andrea@suse.de)
Date: Tue Apr 11 2000 - 08:14:44 EST


On 11 Apr 2000, Andi Kleen wrote:

>The culprit is probably the elevator code in ll_rw_blk, which uses
>a single linked list, and possible the related queue walking code in
>the SCSI driver. These queues are walked all the time to coalesce read/write

That shouldn't be the case. Also note that new elevator merges requests in
O(1) if they are requesting contigous sectors. 2.2.x algorithm is O(N)
also for merging requests of contiougs I/O instead.

If they are seeking all over the place then, yes we have an O(n)
complexity there but the queue is limited. It's hard limit and you are
hitting such limit _each_ time you write to disk some mbyte of data. So if
you have not hangs while kupdate runs, then the elevator isn't going to be
the culprit.

>requests. You can check by booting with profile=2
>and then using /usr/sbin/readprofile to get profile logs. The bad
>routine should clearly show off.

If the elevator would be the culprit you would have no-way to catch it
with the profiler since it always runs with the irq disabled due the
io_request_lock that have to be acquired during I/O completation irqs.

Anyway I'm fairly confident that the profiler will show the real culprit
(I guess Jeff is queueing into the buffer hashtable an insane number of
buffers and that is causing complexity troubles due too much collisions).
If that's the case you'll see you'll see an huge number in the
get_hash_table entry in the profiling.

Also last time I checked the buffer hash was been shrunk because in 2.3.x
the buffer cache isn't used for the data write I/O but the raw devices can
still be used to read/write without a filesystem...

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Apr 15 2000 - 21:00:15 EST