ll_rw_blk.c fails to merge requests. Help!

From: Giuliano Pochini (pochini@denise.shiny.it)
Date: Wed Aug 23 2000 - 22:26:01 EST


I'm working to find this bug without success for at least 2
weeks. The problem is that under high load ll_rw_blk.c stops
to merge new requests to old ones. This causes major slowdown
of all disk operations. I put some printk to produce the
debug below which briefly show who (bu not whay) it cannot
merge blocks anymore. It happens when the queue if filled up,
but not immediately, a number of request are added/merged to
the queue as soon as data is written to the disk. At a
certain point it stop working....
Kernel is 2.4.0-test6, but I have the same problem in 2.2.16

MF means: failed to merge block xxx length yyy
REQ: got new request entry
NOREQ: failed to get new request entry and go to sleep
ADD: block xxx length yyy added (not merged) to the queue

Aug 19 22:17:19 Jay kernel: MF 129436 4
Aug 19 22:17:19 Jay kernel: REQ 129436
Aug 19 22:17:19 Jay kernel: ADD 129436 4
(it merges the next 255 blocks, not shown here and below)
Aug 19 22:17:19 Jay kernel: MF 129692 4
Aug 19 22:17:19 Jay kernel: REQ 129692
Aug 19 22:17:19 Jay kernel: ADD 129692 4
Aug 19 22:17:19 Jay kernel: MF 129948 4
Aug 19 22:17:19 Jay kernel: REQ 129948
Aug 19 22:17:19 Jay kernel: ADD 129948 4
Aug 19 22:17:19 Jay kernel: MF 130204 4
Aug 19 22:17:19 Jay kernel: REQ 130204
Aug 19 22:17:19 Jay kernel: ADD 130204 4
Aug 19 22:17:21 Jay kernel: MF 28 4
Aug 19 22:17:21 Jay kernel: REQ 28
Aug 19 22:17:21 Jay kernel: ADD 28 4
Aug 19 22:17:21 Jay kernel: MF 130460 4
Aug 19 22:17:21 Jay kernel: REQ 130460
Aug 19 22:17:21 Jay kernel: ADD 130460 4
Aug 19 22:17:21 Jay kernel: MF 130716 4
Aug 19 22:17:21 Jay kernel: REQ 130716
Aug 19 22:17:21 Jay kernel: ADD 130716 4

Problems begin here: block 130972/4 cannot be merged and the list if full.

Aug 19 22:17:21 Jay kernel: MF 130972 4
Aug 19 22:17:21 Jay kernel: NOREQ 130972 ---+
Aug 19 22:17:21 Jay kernel: MF 130976 4 |
Aug 19 22:17:21 Jay kernel: REQ 130976 |
Aug 19 22:17:21 Jay kernel: ADD 130976 4 V
Aug 19 22:17:21 Jay kernel: MF 524296 4
Aug 19 22:17:21 Jay kernel: REQ 524296
Aug 19 22:17:21 Jay kernel: ADD 524296 4
Aug 19 22:17:21 Jay kernel: MF 537900 4
Aug 19 22:17:21 Jay kernel: REQ 537900
Aug 19 22:17:21 Jay kernel: ADD 537900 4
Aug 19 22:17:21 Jay kernel: MF 538156 4
Aug 19 22:17:21 Jay kernel: NOREQ 538156
Aug 19 22:17:21 Jay kernel: MF 851976 4
Aug 19 22:17:21 Jay kernel: REQ 851976
Aug 19 22:17:21 Jay kernel: ADD 851976 4

It gets a free entry only now, but it's to late because it has already
decided to add it to the list, so the block will not be merged with
130976/4 --> now there are two contiguous non-merged requests in the queue.
And why request for block 130976 above gets a free entry immediately but
130972 has to wait so much ? Shouldn't __get_request_wait::WAITQUEUE
ensure newly freed entries are assigned at the right order ?

                                            ^
Aug 19 22:17:21 Jay kernel: REQ 130972 ----'
Aug 19 22:17:21 Jay kernel: ADD 130972 4

Now begins a weird loop: It always gets a free entry just 1 turn later,
so it become completely unable to merge requests:

Aug 19 22:17:21 Jay kernel: MF 538160 4
Aug 19 22:17:21 Jay kernel: NOREQ 538160

It goes to sleep, then it gets space for 538156

Aug 19 22:17:21 Jay kernel: REQ 538156
Aug 19 22:17:21 Jay kernel: ADD 538156 4

Ok, added, now it needs space for 538164

Aug 19 22:17:21 Jay kernel: MF 538164 4
Aug 19 22:17:21 Jay kernel: NOREQ 538164

List full, go to sleep.
It gets an entry for 538160 which was requested a while before
...and so on...

Aug 19 22:17:21 Jay kernel: REQ 538160
Aug 19 22:17:21 Jay kernel: ADD 538160 4
Aug 19 22:17:21 Jay kernel: MF 538168 4
Aug 19 22:17:21 Jay kernel: NOREQ 538168
Aug 19 22:17:21 Jay kernel: REQ 538164
Aug 19 22:17:21 Jay kernel: ADD 538164 4
Aug 19 22:17:21 Jay kernel: MF 538172 4
Aug 19 22:17:21 Jay kernel: NOREQ 538172
Aug 19 22:17:21 Jay kernel: REQ 538168
Aug 19 22:17:21 Jay kernel: ADD 538168 4
Aug 19 22:17:21 Jay kernel: MF 538176 4
Aug 19 22:17:21 Jay kernel: NOREQ 538176
Aug 19 22:17:21 Jay kernel: REQ 538172
Aug 19 22:17:21 Jay kernel: ADD 538172 4
Aug 19 22:17:21 Jay kernel: MF 538180 4
Aug 19 22:17:21 Jay kernel: NOREQ 538180
Aug 19 22:17:21 Jay kernel: REQ 538176
Aug 19 22:17:21 Jay kernel: ADD 538176 4
Aug 19 22:17:21 Jay kernel: MF 538184 4
Aug 19 22:17:21 Jay kernel: NOREQ 538184
Aug 19 22:17:21 Jay kernel: REQ 538180
Aug 19 22:17:21 Jay kernel: ADD 538180 4
Aug 19 22:17:21 Jay kernel: MF 538188 4
Aug 19 22:17:21 Jay kernel: NOREQ 538188
Aug 19 22:17:21 Jay kernel: REQ 538184
Aug 19 22:17:21 Jay kernel: ADD 538184 4
Aug 19 22:17:21 Jay kernel: MF 538192 4
Aug 19 22:17:21 Jay kernel: NOREQ 538192
Aug 19 22:17:21 Jay kernel: REQ 538188
Aug 19 22:17:21 Jay kernel: ADD 538188 4
Aug 19 22:17:21 Jay kernel: MF 538196 4
Aug 19 22:17:21 Jay kernel: NOREQ 538196
Aug 19 22:17:22 Jay kernel: REQ 538192
Aug 19 22:17:22 Jay kernel: ADD 538192 4
Aug 19 22:17:22 Jay kernel: MF 538200 4
Aug 19 22:17:22 Jay kernel: NOREQ 538200
Aug 19 22:17:22 Jay kernel: REQ 538196
Aug 19 22:17:22 Jay kernel: ADD 538196 4
...
...

As you can see all requests are added. Not a single one is merged
until I stop cat sending bytes to the file. And it takes a lot
to stop. Since the list is full, I'm unable to write anything to
the HD, so the computer is completely unusable. :((

Bye.

P.S. All requests are 4 blocks long because the hardware sector
size is 2KB.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Aug 23 2000 - 21:00:10 EST