Re: [RFC] cfq: adapt slice to number of processes doing I/O

From: Corrado Zoccolo
Date: Fri Sep 04 2009 - 03:22:40 EST


Hi Jeff,

On Thu, Sep 3, 2009 at 7:16 PM, Jeff Moyer<jmoyer@xxxxxxxxxx> wrote:
> Corrado Zoccolo <czoccolo@xxxxxxxxx> writes:
>
>> Hi Jeff,
>> can you share the benchmark?
>
> Of course, how miserly of me!
>
> http://people.redhat.com/jmoyer/cfq-regression-tests-0.0.1.tar.gz
>
>> I think I have to fix the min slice to consider priority, too, to
>> respect the priorities when there are many processes.
>>
>> For the fairness at a single priority level, my tests show that
>> fairness is improved with the patches (comparing minimum and maximum
>> bandwidth for a set of 32 processes):
>>
>> Original:
>> Run status group 0 (all jobs):
>> Â ÂREAD: io=14192KiB, aggrb=480KiB/s, minb=7KiB/s, maxb=20KiB/s,
>> mint=30001msec, maxt=30258msec
>>
>> Run status group 1 (all jobs):
>> Â ÂREAD: io=829292KiB, aggrb=27816KiB/s, minb=723KiB/s,
>> maxb=1004KiB/s, mint=30004msec, maxt=30529msec
>>
>> Adaptive:
>> Run status group 0 (all jobs):
>> Â ÂREAD: io=14444KiB, aggrb=488KiB/s, minb=12KiB/s, maxb=17KiB/s,
>> mint=30003msec, maxt=30298msec
>>
>> Run status group 1 (all jobs):
>> Â ÂREAD: io=721324KiB, aggrb=24140KiB/s, minb=689KiB/s, maxb=795KiB/s,
>> mint=30003msec, maxt=30598msec
>>
>> Are you using random think times? This could explain the discrepancy.
>
> No, it's just a sync read benchmark. ÂIt's the be4-x-8.fio job file in
> the tarball mentioned above. ÂNote that the run-time is only 10 seconds,
> so maybe that's not enough to get accurate data? ÂIf you try increasing
> it, be careful that you don't read in the entire file and wrap back
> around, as this is a buffered read test, that will skew the results.
>
I could reproduce the skew on my setup, but with a different pattern than yours.
I think the reason is, in my case, that, since the files are very
large (and the fs partially filled), they are allocated on places with
different read speed.
Even cutting file size by half, in order to fit the benchmark on my disk, I get:

[root@et2 cfq-regression-tests]# for i in `seq 1 8`; do sync; echo 3 >
/proc/sys/vm/drop_caches; dd if=/media/lacie/testfile$i of=/dev/null
bs=1M count=30; done
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 0,978737 s, 32,1 MB/s
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 0,971206 s, 32,4 MB/s
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 1,04638 s, 30,1 MB/s
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 1,18266 s, 26,6 MB/s
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 1,32781 s, 23,7 MB/s
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 1,33056 s, 23,6 MB/s
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 1,71092 s, 18,4 MB/s
30+0 records in
30+0 records out
31457280 bytes (31 MB) copied, 1,56802 s, 20,1 MB/s

Original cfq gives:
/home/corrado/cfq-regression-tests/cfq_orig/be4-x-8.fio
/home/corrado/cfq-regression-tests
total priority: 800
total data transferred: 179072
class prio ideal xferred %diff
be 4 22384 28148 25
be 4 22384 29428 31
be 4 22384 27380 22
be 4 22384 23524 5
be 4 22384 20964 -7
be 4 22384 22004 -2
be 4 22384 12020 -47
be 4 22384 15604 -31

patched cfq is a bit more skewed, but follows the same pattern:
/home/corrado/cfq-regression-tests/cfq_adapt/be4-x-8.fio
/home/corrado/cfq-regression-tests
total priority: 800
total data transferred: 204088
class prio ideal xferred %diff
be 4 25511 34020 33
be 4 25511 34020 33
be 4 25511 32484 27
be 4 25511 28388 11
be 4 25511 24628 -4
be 4 25511 24940 -3
be 4 25511 10276 -60
be 4 25511 15332 -40
/home/corrado/cfq-regression-tests

I think it is more skewed because the seek time, with a smaller slice,
impacts more on the amount of transferred data for slower transfer
rates, as shown below:
The amount of data transferred per slice is given by:
(1) slice = seek + amount * rate
if we multiply slice by alpha < 1, we can transfer a different amount
of data (amount1):
(2) alpha* slice = seek + amount1 * rate
substituting slice from (1) into (2), to express amount1 as a function
of amount:
alpha* (seek + amount *rate) = seek + amount1 *rate
amount1 = amount - (1-alpha)* seek / rate
where we see that decreasing rate, increases the impact of seek time.

Your data, though, seems very different from mine, in fact vanilla
doesn't show the same difference.
Can you try running vanilla with smaller base slices, to see if the
same effect can be seen there?
echo 15 > /sys/block/sdb/queue/iosched/slice_async
echo 37 > /sys/block/sdb/queue/iosched/slice_sync
If this doesn't exhibit the same pattern, then maybe the problem is
that the number of busy queues observed by the processes is
consistently different, resulting in unequal time slices.
In that case, I'll add some averaging.

I also added a "sync; echo 3 > /proc/sys/vm/drop_caches" in ioprio.sh
before each test. It is not strictly necessary, because you already
unmount the test FS, but helps reducing the pressure on VM, so that
doing I/O on our files doesn't cause dependent I/O on others as well
to free up space.

Thanks,
Corrado

> Cheers,
> Jeff
>



--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:czoccolo@xxxxxxxxx
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/