Re: IO scheduler based IO controller V10

From: Mike Galbraith
Date: Fri Oct 02 2009 - 14:13:50 EST


On Fri, 2009-10-02 at 19:37 +0200, Jens Axboe wrote:
> On Fri, Oct 02 2009, Ingo Molnar wrote:
> >
> > * Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
> >
> > > On Fri, Oct 02 2009, Ingo Molnar wrote:
> > > >
> > > > * Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
> > > >
> > > > > It's not _that_ easy, it depends a lot on the access patterns. A
> > > > > good example of that is actually the idling that we already do.
> > > > > Say you have two applications, each starting up. If you start them
> > > > > both at the same time and just care for the dumb low latency, then
> > > > > you'll do one IO from each of them in turn. Latency will be good,
> > > > > but throughput will be aweful. And this means that in 20s they are
> > > > > both started, while with the slice idling and priority disk access
> > > > > that CFQ does, you'd hopefully have both up and running in 2s.
> > > > >
> > > > > So latency is good, definitely, but sometimes you have to worry
> > > > > about the bigger picture too. Latency is more than single IOs,
> > > > > it's often for complete operation which may involve lots of IOs.
> > > > > Single IO latency is a benchmark thing, it's not a real life
> > > > > issue. And that's where it becomes complex and not so black and
> > > > > white. Mike's test is a really good example of that.
> > > >
> > > > To the extent of you arguing that Mike's test is artificial (i'm not
> > > > sure you are arguing that) - Mike certainly did not do an artificial
> > > > test - he tested 'konsole' cache-cold startup latency, such as:
> > >
> > > [snip]
> > >
> > > I was saying the exact opposite, that Mike's test is a good example of
> > > a valid test. It's not measuring single IO latencies, it's doing a
> > > sequence of valid events and looking at the latency for those. It's
> > > benchmarking the bigger picture, not a microbenchmark.
> >
> > Good, so we are in violent agreement :-)
>
> Yes, perhaps that last sentence didn't provide enough evidence of which
> category I put Mike's test into :-)
>
> So to kick things off, I added an 'interactive' knob to CFQ and
> defaulted it to on, along with re-enabling slice idling for hardware
> that does tagged command queuing. This is almost completely identical to
> what Vivek Goyal originally posted, it's just combined into one and uses
> the term 'interactive' instead of 'fairness'. I think the former is a
> better umbrella under which to add further tweaks that may sacrifice
> throughput slightly, in the quest for better latency.
>
> It's queued up in the for-linus branch.

FWIW, I did a matrix of Vivek's patch combined with my hack. Seems we
do lose a bit of dd throughput over stock with either or both.

dd pre 65.1 65.4 67.5 64.8 65.1 65.5 fairness=1 overload_delay=1
perf stat 1.70 1.94 1.32 1.89 1.87 1.7
dd post 69.4 62.3 69.7 70.3 69.6 68.2

dd pre 67.0 67.8 64.7 64.7 64.9 65.8 fairness=1 overload_delay=0
perf stat 4.89 3.13 2.98 2.71 2.17 3.1
dd post 67.2 63.3 62.6 62.8 63.1 63.8

dd pre 65.0 66.0 66.9 64.6 67.0 65.9 fairness=0 overload_delay=1
perf stat 4.66 3.81 4.23 2.98 4.23 3.9
dd post 62.0 60.8 62.4 61.4 62.2 61.7

dd pre 65.3 65.6 64.9 69.5 65.8 66.2 fairness=0 overload_delay=0
perf stat 14.79 9.11 14.16 8.44 13.67 12.0
dd post 64.1 66.5 64.0 66.5 64.4 65.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/