Re: Errors in the VM - detailed

From: Jens Axboe (axboe@suse.de)
Date: Thu Jan 31 2002 - 16:37:54 EST


On Thu, Jan 31 2002, Andrew Morton wrote:
> rmap 11c:
> ...
> - elevator improvement (Andrew Morton)
>
> Which includes:
>
> - queue_nr_requests = 64;
> - if (total_ram > MB(32))
> - queue_nr_requests = 128; + queue_nr_requests = (total_ram >> 9) & ~15; /* One per half-megabyte */
> + if (queue_nr_requests < 32)
> + queue_nr_requests = 32;
> + if (queue_nr_requests > 1024)
> + queue_nr_requests = 1024;
>
>
> So Roy is running with 1024 requests.

Ah yes, of course.

> The question is (sorry, Roy): does this need fixing?
>
> The only thing which can trigger it is when we have
> zillions of threads doing reads (or zillions of outstanding
> aio read requests) or when there are a large number of
> unmerged write requests in the elevator. It's a rare
> case.

Indeed.

> If we _do_ need a fix, then perhaps we should just stop
> using READA in the readhead code? readahead is absolutely
> vital to throughput, and best-effort request allocation
> just isn't good enough.

Hmm well. Maybe just a small pool of requests set aside for READA would
be a better idea. That way "normal" reads are not able to starve READA
completely.

Something ala this, completely untested. Will try and boot it now :-)
Roy, could you please test? It's against 2.4.18-pre7, I'll boot it now
as well...

--- /opt/kernel/linux-2.4.18-pre7/include/linux/blkdev.h Mon Nov 26 14:29:17 2001
+++ linux/include/linux/blkdev.h Thu Jan 31 22:29:01 2002
@@ -74,9 +74,9 @@
 struct request_queue
 {
         /*
- * the queue request freelist, one for reads and one for writes
+ * the queue request freelist, one for READ, WRITE, and READA
          */
- struct request_list rq[2];
+ struct request_list rq[3];
 
         /*
          * Together with queue_head for cacheline sharing
--- /opt/kernel/linux-2.4.18-pre7/drivers/block/ll_rw_blk.c Sun Jan 27 16:06:31 2002
+++ linux/drivers/block/ll_rw_blk.c Thu Jan 31 22:36:24 2002
@@ -333,8 +333,10 @@
 
         INIT_LIST_HEAD(&q->rq[READ].free);
         INIT_LIST_HEAD(&q->rq[WRITE].free);
+ INIT_LIST_HEAD(&q->rq[READA].free);
         q->rq[READ].count = 0;
         q->rq[WRITE].count = 0;
+ q->rq[READA].count = 0;
 
         /*
          * Divide requests in half between read and write
@@ -352,6 +354,20 @@
                 q->rq[i&1].count++;
         }
 
+ for (i = 0; i < queue_nr_requests / 4; i++) {
+ rq = kmem_cache_alloc(request_cachep, SLAB_KERNEL);
+ /*
+ * hey well, this needs better checking (as well as the above)
+ */
+ if (!rq)
+ break;
+
+ memset(rq, 0, sizeof(struct request));
+ rq->rq_status = RQ_INACTIVE;
+ list_add(&rq->queue, &q->rq[READA].free);
+ q->rq[READA].count++;
+ }
+
         init_waitqueue_head(&q->wait_for_request);
         spin_lock_init(&q->queue_lock);
 }
@@ -752,12 +768,18 @@
                 req = freereq;
                 freereq = NULL;
         } else if ((req = get_request(q, rw)) == NULL) {
- spin_unlock_irq(&io_request_lock);
+
                 if (rw_ahead)
- goto end_io;
+ req = get_request(q, READA);
 
- freereq = __get_request_wait(q, rw);
- goto again;
+ spin_unlock_irq(&io_request_lock);
+
+ if (!req && rw_ahead)
+ goto end_io;
+ else if (!req) {
+ freereq = __get_request_wait(q, rw);
+ goto again;
+ }
         }
 
 /* fill up the request-info, and add it to the queue */
@@ -1119,7 +1141,7 @@
          */
         queue_nr_requests = 64;
         if (total_ram > MB(32))
- queue_nr_requests = 128;
+ queue_nr_requests = 256;
 
         /*
          * Batch frees according to queue length

-- 
Jens Axboe

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jan 31 2002 - 21:01:39 EST