Re: ll_rw_blk.c fails to merge requests. Help!

From: Neil Brown (neilb@cse.unsw.edu.au)
Date: Wed Aug 23 2000 - 18:58:25 EST

Next message: Kent Hunt: "Re: xmms and es1371 produce static impulse about five times per second"
Previous message: Michael Rothwell: "Re: SCO: "thread creation is about a thousand times faster than onnative"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wednesday August 23, pochini@denise.shiny.it wrote:
>
> I'm working to find this bug without success for at least 2
> weeks. The problem is that under high load ll_rw_blk.c stops
> to merge new requests to old ones. This causes major slowdown
> of all disk operations. I put some printk to produce the
> debug below which briefly show who (bu not whay) it cannot
> merge blocks anymore. It happens when the queue if filled up,
> but not immediately, a number of request are added/merged to
> the queue as soon as data is written to the disk. At a
> certain point it stop working....
> Kernel is 2.4.0-test6, but I have the same problem in 2.2.16
>
>
> MF means: failed to merge block xxx length yyy
> REQ: got new request entry
> NOREQ: failed to get new request entry and go to sleep
> ADD: block xxx length yyy added (not merged) to the queue
>
..deleted...
>
> Problems begin here: block 130972/4 cannot be merged and the list if full.
>
> Aug 19 22:17:21 Jay kernel: MF 130972 4
> Aug 19 22:17:21 Jay kernel: NOREQ 130972 ---+
> Aug 19 22:17:21 Jay kernel: MF 130976 4 |
> Aug 19 22:17:21 Jay kernel: REQ 130976 |
> Aug 19 22:17:21 Jay kernel: ADD 130976 4 V
> Aug 19 22:17:21 Jay kernel: MF 524296 4
> Aug 19 22:17:21 Jay kernel: REQ 524296
> Aug 19 22:17:21 Jay kernel: ADD 524296 4
> Aug 19 22:17:21 Jay kernel: MF 537900 4
> Aug 19 22:17:21 Jay kernel: REQ 537900
> Aug 19 22:17:21 Jay kernel: ADD 537900 4
> Aug 19 22:17:21 Jay kernel: MF 538156 4
> Aug 19 22:17:21 Jay kernel: NOREQ 538156
> Aug 19 22:17:21 Jay kernel: MF 851976 4
> Aug 19 22:17:21 Jay kernel: REQ 851976
> Aug 19 22:17:21 Jay kernel: ADD 851976 4
>
> It gets a free entry only now, but it's to late because it has already
> decided to add it to the list, so the block will not be merged with
> 130976/4 --> now there are two contiguous non-merged requests in the queue.
> And why request for block 130976 above gets a free entry immediately but
> 130972 has to wait so much ? Shouldn't __get_request_wait::WAITQUEUE
> ensure newly freed entries are assigned at the right order ?

Not necessarily. When a request is freed, the oldest waiting thread
is woken, but it might not actually get to run before some other
thread steals the request. You could force a strict ordering if you
really wanted to, but I don't know how much it would help. See
STRICT_REQUEST_ORDERING in the patch below.

>
> ^
> Aug 19 22:17:21 Jay kernel: REQ 130972 ----'
> Aug 19 22:17:21 Jay kernel: ADD 130972 4
>
> Now begins a weird loop: It always gets a free entry just 1 turn later,
> so it become completely unable to merge requests:
>
> Aug 19 22:17:21 Jay kernel: MF 538160 4
> Aug 19 22:17:21 Jay kernel: NOREQ 538160
>
> It goes to sleep, then it gets space for 538156
>
> Aug 19 22:17:21 Jay kernel: REQ 538156
> Aug 19 22:17:21 Jay kernel: ADD 538156 4
>
> Ok, added, now it needs space for 538164
>
> Aug 19 22:17:21 Jay kernel: MF 538164 4
> Aug 19 22:17:21 Jay kernel: NOREQ 538164
>
> List full, go to sleep.
> It gets an entry for 538160 which was requested a while before
> ...and so on...

I noticed the possibility for this in the code a few days ago but
thought "that could never happen". Maybe I was wrong!
If __make_request sleeps to get a request, then it should really
recheck if merging can happen, otherwise you could get two adjacent
requests on the queue which will never get merged. You can also
obviously get a lot more than two.

Please try the following (totally untested I'm afraid) patch and see
if it makes a difference.
It basically retries the merge after waiting for a request, and then
either releases the request if the merge was successful, or uses that
request to add the bh to the queue.
It also contains some (fairly ugly) code inside #ifdef
STRICT_REQUEST_ORDERING which should encourage a strict ordering for
threads to get the request structures they are waiting for.

NeilBrown

--- ll_rw_blk.c 2000/08/22 01:51:03 1.1
+++ ll_rw_blk.c 2000/08/23 23:50:11
@@ -682,9 +682,9 @@
{
         unsigned int sector, count;
         int max_segments = MAX_SEGMENTS;
- struct request * req = NULL;
+ struct request * req = NULL, * freereq = NULL;
         int rw_ahead, max_sectors, el_ret;
- struct list_head *head = &q->queue_head;
+ struct list_head *head;
         int latency;
         elevator_t *elevator = &q->elevator;

@@ -740,6 +740,7 @@

         latency = elevator_request_latency(elevator, rw);

+ again:
         /*
          * Now we acquire the request spinlock, we have to be mega careful
          * not to schedule or do something nonatomic
@@ -749,6 +750,7 @@
         /*
          * skip first entry, for devices with active queue head
          */
+ head = &q->queue_head;
         if (q->head_active && !q->plugged)
                 head = head->next;

@@ -802,20 +804,46 @@
          * are not crucial.
          */
get_rq:
- if ((req = get_request(q, rw)) == NULL) {
+ if (freereq) {
+ req = freereq;
+ freereq = NULL;
+ } else if ((req = get_request(q, rw)) == NULL) {
                 spin_unlock_irq(&io_request_lock);
                 if (rw_ahead)
                         goto end_io;

- req = __get_request_wait(q, rw);
- spin_lock_irq(&io_request_lock);
-
- if (q->head_active) {
- head = &q->queue_head;
- if (!q->plugged)
- head = head->next;
+ freereq = __get_request_wait(q, rw);
+ goto again;
+ }
+#ifdef STRICT_REQUEST_ORDERING
+ else {
+ /* we got a request without waiting, but maybe it wasn't our turn */
+ if (waitqueue_active(&q->wait_for_request)) {
+ register struct request *rq;
+ DECLARE_WAITQUEUE(wait, current);
+
+ add_wait_queue_exclusive(&q->wait_for_request, &wait);
+ __set_current_state(TASK_UNINTERRUPTIBLE | TASK_EXCLUSIVE);
+ blkdev_release_request(freereq);
+ spin_unlock_irq(&io_request_lock);
+ schedule();
+ for (;;) {
+ __set_current_state(TASK_UNINTERRUPTIBLE | TASK_EXCLUSIVE);
+ spin_lock_irq(&io_request_lock);
+ rq = get_request(q, rw);
+ spin_unlock_irq(&io_request_lock);
+ if (rq)
+ break;
+ generic_unplug_device(q);
+ schedule();
+ }
+ remove_wait_queue(&q->wait_for_request, &wait);
+ current->state = TASK_RUNNING;
+ freereq = rq;
+ goto again;
                 }
         }
+#endif

/* fill up the request-info, and add it to the queue */
         req->cmd = rw;
@@ -833,6 +861,8 @@
         req->e = elevator;
         add_request(q, req, head, latency);
out:
+ if (freereq)
+ blkdev_release_request(freereq);
         if (!q->plugged)
                 (q->request_fn)(q);
         spin_unlock_irq(&io_request_lock);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: Kent Hunt: "Re: xmms and es1371 produce static impulse about five times per second"
Previous message: Michael Rothwell: "Re: SCO: "thread creation is about a thousand times faster than onnative"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Aug 23 2000 - 21:00:10 EST