Re: IO scheduler based IO controller V10

From: Ryo Tsuruta
Date: Wed Oct 07 2009 - 10:39:28 EST

Next message: Linus Torvalds: "Re: [PATCH 6/7] x86 dumpstack: fix printing of stack dumploglevels"
Previous message: Joerg Roedel: "[PATCH 08/10] KVM: SVM: Add tracepoint for invlpga instruction"
In reply to: Vivek Goyal: "Re: IO scheduler based IO controller V10"
Next in thread: Vivek Goyal: "Re: IO scheduler based IO controller V10"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Vivek,

Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> > > >> If one would like to
> > > >> combine some physical disks into one logical device like a dm-linear,
> > > >> I think one should map the IO controller on each physical device and
> > > >> combine them into one logical device.
> > > >>
> > > >
> > > > In fact this sounds like a more complicated step where one has to setup
> > > > one dm-ioband device on top of each physical device. But I am assuming
> > > > that this will go away once you move to per reuqest queue like implementation.
> >
> > I don't understand why the per request queue implementation makes it
> > go away. If dm-ioband is integrated into the LVM tools, it could allow
> > users to skip the complicated steps to configure dm-linear devices.
> >
>
> Those who are not using dm-tools will be forced to use dm-tools for
> bandwidth control features.

If once dm-ioband is integrated into the LVM tools and bandwidth can
be assigned per device by lvcreate, the use of dm-tools is no longer
required for users.

> Interesting. In all the test cases you always test with sequential
> readers. I have changed the test case a bit (I have already reported the
> results in another mail, now running the same test again with dm-version
> 1.14). I made all the readers doing direct IO and in other group I put
> a buffered writer. So setup looks as follows.
>
> In group1, I launch 1 prio 0 reader and increasing number of prio4
> readers. In group 2 I just run a dd doing buffered writes. Weights of
> both the groups are 100 each.
>
> Following are the results on 2.6.31 kernel.
>
> With-dm-ioband
> ==============
> <------------prio4 readers----------------------> <---prio0 reader------>
> nr Max-bdwidth Min-bdwidth Agg-bdwidth Max-latency Agg-bdwidth Max-latency
> 1 9992KiB/s 9992KiB/s 9992KiB/s 413K usec 4621KiB/s 369K usec
> 2 4859KiB/s 4265KiB/s 9122KiB/s 344K usec 4915KiB/s 401K usec
> 4 2238KiB/s 1381KiB/s 7703KiB/s 532K usec 3195KiB/s 546K usec
> 8 504KiB/s 46KiB/s 1439KiB/s 399K usec 7661KiB/s 220K usec
> 16 131KiB/s 26KiB/s 638KiB/s 492K usec 4847KiB/s 359K usec
>
> With vanilla CFQ
> ================
> <------------prio4 readers----------------------> <---prio0 reader------>
> nr Max-bdwidth Min-bdwidth Agg-bdwidth Max-latency Agg-bdwidth Max-latency
> 1 10779KiB/s 10779KiB/s 10779KiB/s 407K usec 16094KiB/s 808K usec
> 2 7045KiB/s 6913KiB/s 13959KiB/s 538K usec 18794KiB/s 761K usec
> 4 7842KiB/s 4409KiB/s 20967KiB/s 876K usec 12543KiB/s 443K usec
> 8 6198KiB/s 2426KiB/s 24219KiB/s 1469K usec 9483KiB/s 685K usec
> 16 5041KiB/s 1358KiB/s 27022KiB/s 2417K usec 6211KiB/s 1025K usec
>
>
> Above results are showing how bandwidth got distributed between prio4 and
> prio1 readers with-in group as we increased number of prio4 readers in
> the group. In another group a buffered writer is continuously going on
> as competitor.
>
> Notice, with dm-ioband how bandwidth allocation is broken.
>
> With 1 prio4 reader, prio4 reader got more bandwidth than prio1 reader.
>
> With 2 prio4 readers, looks like prio4 got almost same BW as prio1.
>
> With 8 and 16 prio4 readers, looks like prio0 readers takes over and prio4
> readers starve.
>
> As we incresae number of prio4 readers in the group, their total aggregate
> BW share should increase. Instread it is decreasing.
>
> So to me in the face of competition with a writer in other group, BW is
> all over the place. Some of these might be dm-ioband bugs and some of
> these might be coming from the fact that buffering takes place in higher
> layer and dispatch is FIFO?

Thank you for testing. I did the same test and here are the results.

with vanilla CFQ
<------------prio4 readers------------------> prio0 group2
maxbw minbw aggrbw maxlat aggrbw bufwrite
1 12,140KiB/s 12,140KiB/s 12,140KiB/s 30001msec 11,125KiB/s 1,923KiB/s
2 3,967KiB/s 3,930KiB/s 7,897KiB/s 30001msec 14,213KiB/s 1,586KiB/s
4 3,399KiB/s 3,066KiB/s 13,031KiB/s 30082msec 8,930KiB/s 1,296KiB/s
8 2,086KiB/s 1,720KiB/s 15,266KiB/s 30003msec 7,546KiB/s 517KiB/s
16 1,156KiB/s 837KiB/s 15,377KiB/s 30033msec 4,282KiB/s 600KiB/s

with dm-ioband weight-iosize policy
<------------prio4 readers------------------> prio0 group2
maxbw minbw aggrbw maxlat aggrbw bufwrite
1 107KiB/s 107KiB/s 107KiB/s 30007msec 12,242KiB/s 12,320KiB/s
2 1,259KiB/s 702KiB/s 1,961KiB/s 30037msec 9,657KiB/s 11,657KiB/s
4 2,705KiB/s 29KiB/s 5,186KiB/s 30026msec 5,927KiB/s 11,300KiB/s
8 2,428KiB/s 27KiB/s 5,629KiB/s 30054msec 5,057KiB/s 10,704KiB/s
16 2,465KiB/s 23KiB/s 4,309KiB/s 30032msec 4,750KiB/s 9,088KiB/s

The results are somewhat different from yours. The bandwidth is
distributed to each group equally, but CFQ priority is broken as you
said. I think that the reason is not because of FIFO, but because
some IO requests are issued from dm-ioband's kernel thread on behalf of
processes which origirante the IO requests, then CFQ assumes that the
kernel thread is the originator and uses its io_context.

> > Here is my test script.
> > -------------------------------------------------------------------------
> > arg="--time_base --rw=read --runtime=30 --directory=/mnt1 --size=1024M \
> > --group_reporting"
> >
> > sync
> > echo 3 > /proc/sys/vm/drop_caches
> >
> > echo $$ > /cgroup/1/tasks
> > ionice -c 2 -n 0 fio $arg --name=read1 --output=read1.log --numjobs=16 &
> > echo $$ > /cgroup/2/tasks
> > ionice -c 2 -n 0 fio $arg --name=read2 --output=read2.log --numjobs=16 &
> > ionice -c 1 -n 0 fio $arg --name=read3 --output=read3.log --numjobs=1 &
> > echo $$ > /cgroup/tasks
> > wait
> > -------------------------------------------------------------------------
> >
> > Be that as it way, I think that if every bio can point the iocontext
> > of the process, then it makes it possible to handle IO priority in the
> > higher level controller. A patchse has already posted by Takhashi-san.
> > What do you think about this idea?
> >
> > Date Tue, 22 Apr 2008 22:51:31 +0900 (JST)
> > Subject [RFC][PATCH 1/10] I/O context inheritance
> > From Hirokazu Takahashi <>
> > http://lkml.org/lkml/2008/4/22/195
>
> So far you have been denying that there are issues with ioprio with-in
> group in higher level controller. Here you seems to be saying that there are
> issues with ioprio and we need to take this patch in to solve the issue? I am
> confused?

The true intention of this patch is to preserve the io-context of a
process which originate it, but I think that we could also make use of
this patch for one of the way to solve this issue.

> Anyway, if you think that above patch is needed to solve the issue of
> ioprio in higher level controller, why are you not posting it as part of
> your patch series regularly, so that we can also apply this patch along
> with other patches and test the effects?

I will post the patch, but I would like to find out and understand the
reason of above test results before posting the patch.

> Against what kernel version above patches apply. The biocgroup patches
> I tried against 2.6.31 as well as 2.6.32-rc1 and it does not apply cleanly
> against any of these?
>
> So for the time being I am doing testing with biocgroup patches.

I created those patches against 2.6.32-rc1 and made sure the patches
can be cleanly applied to that version.

Thanks,
Ryo Tsuruta
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Linus Torvalds: "Re: [PATCH 6/7] x86 dumpstack: fix printing of stack dumploglevels"
Previous message: Joerg Roedel: "[PATCH 08/10] KVM: SVM: Add tracepoint for invlpga instruction"
In reply to: Vivek Goyal: "Re: IO scheduler based IO controller V10"
Next in thread: Vivek Goyal: "Re: IO scheduler based IO controller V10"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]