Re: [RFC] Block IO Controller V2 - some results

From: Vivek Goyal
Date: Wed Nov 18 2009 - 10:34:21 EST


On Tue, Nov 17, 2009 at 07:38:47AM -0500, Alan D. Brunelle wrote:
> On Mon, 2009-11-16 at 17:18 -0500, Vivek Goyal wrote:
> > On Mon, Nov 16, 2009 at 03:51:00PM -0500, Alan D. Brunelle wrote:
> >
> > [..]
> > > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> > >
> > > The next thing to look at is to see what the "penalty" is for the
> > > additional code: see how much bandwidth we lose for the capability
> > > added. Here we see the sum of the system's throughput for the various
> > > tests:
> > >
> > > ---- ---- - ----------- ----------- ----------- -----------
> > > Mode RdWr N base ioc off ioc no idle ioc idle
> > > ---- ---- - ----------- ----------- ----------- -----------
> > > rnd rd 2 17.3 17.1 9.4 9.1
> > > rnd rd 4 27.1 27.1 8.1 8.2
> > > rnd rd 8 37.1 37.1 6.8 7.1
> > >
> >

Hi Alan,

I got hold of a better system where few disks are striped. On this system
I can also see the performance drop and it does come from the fact that
we idle on empty sync-noidle service tree. So with multiple groups we
increase the instances of sync-noidle trees hence more idling and hence
reduced throughput.

With this patch, I had done a little optimization where I don't idle on
a sync-noidle service tree if there are no competing sync-idle or async
queues.

What that means is that if a group is doing only random IO and it does not
have sufficient IO to keep disk busy, then we will move onto next group
and start dispatching from that. So in your tests, with this patch you
should see better results for random reads with group_idle=0. With
group_idle=1, results will still remain same as we continue to idle on
sync-noidle service tree.

It is working for me. Can you please try it out. You can just apply this
patch on top of V3. (You don't have to apply hw_tag patch from corrodo).

Thanks
Vivek


o Now we don't remove the queue from service tree until we expire it, even if
queue is empty. So st->count = 0 is not a valid state while we have a cfqq
at hand. Fix it in arm_slice_timer().

o wait_busy gets set only if group is emtpy. If st->count > 1, then group is
not empty and wait_busy will not be set. Remove that extra check.

o There is no need to idle on aysnc service tree as it is backlogged most of
the time if writes are on. Those queues don't get deleted hence don't wait
on async service tree. Similiarly don't wait on sync-idle service tree as
we do idling on individual queues if need be. Fixed cfq_should_idle().

o Currently we wait on sync-noidle service tree so that sync-noidle type of
workload does not get swamped by sync-idle or async type of workload. Don't
do this idling if there are no sync-idle or async type of queues in the group
and there are other groups to dispatch the requests from and user has decided
not to wait on slow groups to achieve better throughput. (group_idle=0).

This will make sure if some group is doing just random IO and does not
have sufficient IO to keep the disk busy, we will move onto other groups to
dispatch the requests from and utilize the storage better.

Signed-off-by: Vivek Goyal <vgoyal@xxxxxxxxxx>
---
block/cfq-iosched.c | 35 +++++++++++++++++++----------------
1 file changed, 19 insertions(+), 16 deletions(-)

Index: linux6/block/cfq-iosched.c
===================================================================
--- linux6.orig/block/cfq-iosched.c 2009-11-17 14:44:09.000000000 -0500
+++ linux6/block/cfq-iosched.c 2009-11-18 10:09:33.000000000 -0500
@@ -899,6 +899,10 @@ static bool cfq_should_idle(struct cfq_d
{
enum wl_prio_t prio = cfqq_prio(cfqq);
struct cfq_rb_root *service_tree = cfqq->service_tree;
+ struct cfq_group *cfqg = cfqq->cfqg;
+
+ BUG_ON(!service_tree);
+ BUG_ON(!service_tree->count);

/* We never do for idle class queues. */
if (prio == IDLE_WORKLOAD)
@@ -908,18 +912,18 @@ static bool cfq_should_idle(struct cfq_d
if (cfq_cfqq_idle_window(cfqq))
return true;

+ /* Don't idle on async and sync-idle service trees */
+ if (cfqd->serving_type != SYNC_NOIDLE_WORKLOAD)
+ return false;
/*
- * Otherwise, we do only if they are the last ones
- * in their service tree.
+ * If there are other competing groups present, don't wait on service
+ * tree if this is last queue in the group and there are no other
+ * competing queues (sync-idle or async) queues present
*/
- if (!service_tree)
- service_tree = service_tree_for(cfqq->cfqg, prio,
- cfqq_type(cfqq), cfqd);
-
- if (service_tree->count == 0)
- return true;
-
- return (service_tree->count == 1 && cfq_rb_first(service_tree) == cfqq);
+ if (cfqd->nr_groups > 1 && !cfqd->cfq_group_idle)
+ return (service_tree->count == 1 && cfqg->nr_cfqq > 1);
+ else
+ return service_tree->count == 1;
}

#ifdef CONFIG_CFQ_GROUP_IOSCHED
@@ -1102,9 +1106,6 @@ static inline bool cfqq_should_wait_busy
if (!RB_EMPTY_ROOT(&cfqq->sort_list) || cfqq->cfqg->nr_cfqq > 1)
return false;

- if (!cfq_should_idle(cfqq->cfqd, cfqq))
- return false;
-
return true;
}

@@ -1801,7 +1802,10 @@ static bool cfq_arm_slice_timer(struct c
/*
* idle is disabled, either manually or by past process history
*/
- if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq))
+ if (!cfqd->cfq_slice_idle)
+ return false;
+
+ if (!cfq_should_idle(cfqd, cfqq) && !wait_busy)
return false;

/*
@@ -1837,8 +1841,7 @@ static bool cfq_arm_slice_timer(struct c
*/
st = service_tree_for(cfqq->cfqg, cfqd->serving_prio,
SYNC_NOIDLE_WORKLOAD, cfqd);
- if (!wait_busy && cfqd->serving_type == SYNC_NOIDLE_WORKLOAD
- && st->count > 0) {
+ if (cfqd->serving_type == SYNC_NOIDLE_WORKLOAD && st->count > 1) {
if (blk_queue_nonrot(cfqd->queue) || cfqd->hw_tag)
return false;
sl = min(sl, msecs_to_jiffies(CFQ_MIN_TT));
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/