Re: [RFC] cfq: adapt slice to number of processes doing I/O

From: Corrado Zoccolo
Date: Thu Sep 03 2009 - 12:47:54 EST


Hi Jeff,
can you share the benchmark?
I think I have to fix the min slice to consider priority, too, to
respect the priorities when there are many processes.

For the fairness at a single priority level, my tests show that
fairness is improved with the patches (comparing minimum and maximum
bandwidth for a set of 32 processes):

Original:
Run status group 0 (all jobs):
READ: io=14192KiB, aggrb=480KiB/s, minb=7KiB/s, maxb=20KiB/s,
mint=30001msec, maxt=30258msec

Run status group 1 (all jobs):
READ: io=829292KiB, aggrb=27816KiB/s, minb=723KiB/s,
maxb=1004KiB/s, mint=30004msec, maxt=30529msec

Adaptive:
Run status group 0 (all jobs):
READ: io=14444KiB, aggrb=488KiB/s, minb=12KiB/s, maxb=17KiB/s,
mint=30003msec, maxt=30298msec

Run status group 1 (all jobs):
READ: io=721324KiB, aggrb=24140KiB/s, minb=689KiB/s, maxb=795KiB/s,
mint=30003msec, maxt=30598msec

Are you using random think times? This could explain the discrepancy.

Corrado

On Thu, Sep 3, 2009 at 5:38 PM, Jeff Moyer<jmoyer@xxxxxxxxxx> wrote:
> Jeff Moyer <jmoyer@xxxxxxxxxx> writes:
>
>> Corrado Zoccolo <czoccolo@xxxxxxxxx> writes:
>>
>>> When the number of processes performing I/O concurrently increases, Âa
>>> fixed time slice per process will cause large latencies.
>>> In the patch, if there are more than 3 processes performing concurrent
>>> I/O, we scale the time slice down proportionally.
>>> To safeguard sequential bandwidth, we impose a minimum time slice,
>>> computed from cfq_slice_idle (the idea is that cfq_slice_idle
>>> approximates the cost for a seek).
>>>
>>> I performed two tests, on a rotational disk:
>>> * 32 concurrent processes performing random reads
>>> ** the bandwidth is improved from 466KB/s to 477KB/s
>>> ** the maximum latency is reduced from 7.667s to 1.728
>>> * 32 concurrent processes performing sequential reads
>>> ** the bandwidth is reduced from 28093KB/s to 24393KB/s
>>> ** the maximum latency is reduced from 3.781s to 1.115s
>>>
>>> I expect numbers to be even better on SSDs, where the penalty to
>>> disrupt sequential read is much less.
>>
>> Interesting approach. ÂI'm not sure what the benefits will be on SSDs,
>> as the idling logic is disabled for them (when nonrot is set and they
>> support ncq). ÂSee cfq_arm_slice_timer.
>>
>>> Signed-off-by: Corrado Zoccolo <czoccolo@gmail-com>
>>>
>>> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
>>> index fd7080e..cff4ca8 100644
>>> --- a/block/cfq-iosched.c
>>> +++ b/block/cfq-iosched.c
>>> @@ -306,7 +306,15 @@ cfq_prio_to_slice(struct cfq_data *cfqd, struct
>>> cfq_queue *cfqq)
>>> Âstatic inline void
>>> Âcfq_set_prio_slice(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>>> Â{
>>> - Â Â Â cfqq->slice_end = cfq_prio_to_slice(cfqd, cfqq) + jiffies;
>>> + Â Â Â unsigned low_slice = cfqd->cfq_slice_idle * (1 + cfq_cfqq_sync(cfqq));
>>> + Â Â Â unsigned interested_queues = cfq_class_rt(cfqq) ?
>>> cfqd->busy_rt_queues : cfqd->busy_queues;
>>
>> Either my mailer displayed this wrong, or yours wraps lines.
>>
>>> + Â Â Â unsigned slice = cfq_prio_to_slice(cfqd, cfqq);
>>> + Â Â Â if (interested_queues > 3) {
>>> + Â Â Â Â Â Â Â slice *= 3;
>>
>> How did you come to this magic number of 3, both for the number of
>> competing tasks and the multiplier for the slice time? ÂDid you
>> experiment with this number at all?
>>
>>> + Â Â Â Â Â Â Â slice /= interested_queues;
>>
>> Of course you realize this could disable the idling logic completely,
>> right? ÂI'll run this patch through some tests and let you know how it
>> goes.
>
> I missed that you updated the slice end based on a max of slice and
> low_slice. ÂSorry about that.
>
> This patch does not fare well when judging fairness between processes.
> I have several fio jobs that generate read workloads, and I try to
> figure out whether the I/O scheduler is providing fairness based on the
> I/O priorities of the processes. ÂWith your patch applied, we get the
> following results:
>
> total priority: 880
> total data transferred: 1045920
> class  prio  Âideal  xferred %diff
> be   Â0    213938 Â352500 Â64
> be   Â1    190167 Â193012 Â1
> be   Â2    166396 Â123380 Â-26
> be   Â3    142625 Â86260  -40
> be   Â4    118854 Â62964  -48
> be   Â5    95083  40180  -58
> be   Â6    71312  74484  4
> be   Â7    47541  113140 Â137
>
> Class and prio should be self-explanatory. Âideal is my cooked up
> version of the ideal number of bytes the given priority should have
> transferred based on the total data transferred and all processes
> weighted by priority competing for the disk. Âxferred is the actual
> amount of data transferred, and %diff is the difference between those
> last two columns.
>
> Notice that best effort priority 7 managed to transfer more data than be
> prio 3. ÂThat's bad. ÂNow, let's look at 8 processes all at the same
> priority level:
>
> total priority: 800
> total data transferred: 1071036
> class  prio  Âideal  xferred %diff
> be   Â4    133879 Â222452 Â66
> be   Â4    133879 Â243188 Â81
> be   Â4    133879 Â187380 Â39
> be   Â4    133879 Â42512  -69
> be   Â4    133879 Â39156  -71
> be   Â4    133879 Â47604  -65
> be   Â4    133879 Â37364  -73
> be   Â4    133879 Â251380 Â87
>
> Hmm. ÂThat doesn't look good.
>
> For comparison, here is the output from the vanilla kernel for those two
> runs:
>
> total priority: 880
> total data transferred: 954272
> class  prio  Âideal  xferred %diff
> be   Â0    195192 Â229108 Â17
> be   Â1    173504 Â202740 Â16
> be   Â2    151816 Â156660 Â3
> be   Â3    130128 Â152052 Â16
> be   Â4    108440 Â91636  -16
> be   Â5    86752  64244  -26
> be   Â6    65064  34292  -48
> be   Â7    43376  23540  -46
>
> total priority: 800
> total data transferred: 887264
> class  prio  Âideal  xferred %diff
> be   Â4    110908 Â124404 Â12
> be   Â4    110908 Â123380 Â11
> be   Â4    110908 Â118004 Â6
> be   Â4    110908 Â113396 Â2
> be   Â4    110908 Â107252 Â-4
> be   Â4    110908 Â98356  -12
> be   Â4    110908 Â96244  -14
> be   Â4    110908 Â106228 Â-5
>
> It's worth noting that the overall throughput went up in the patched
> kernel for this second case. ÂHowever, if we care at all about the
> notion of I/O priorities, I think your patch needs more work.
>
> Cheers,
> Jeff
>



--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:czoccolo@xxxxxxxxx
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/