Re: [RFC] Improve CFQ fairness

From: Jeff Moyer
Date: Thu Sep 03 2009 - 13:11:13 EST

Next message: Dmitry Torokhov: "Re: Dell-laptop is not working without wireless"
Previous message: Cyrill Gorcunov: "Re: binfmt_flat.c && bprm->cred (Was: [PATCH 0/1] exec: do notsleep in TASK_TRACED under ->cred_guard_mutex)"
Next in thread: Vivek Goyal: "Re: [RFC] Improve CFQ fairness"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Vivek Goyal <vgoyal@xxxxxxxxxx> writes:

> Hi,
>
> Sometimes fairness and throughput are orthogonal to each other. CFQ provides
> fair access to disk to different processes in terms of disk time used by the
> process.
>
> Currently above notion of fairness seems to be valid only for sync queues
> whose think time is within slice_idle (8ms by default) limit.
>
> To boost throughput, CFQ disables idling based on seek patterns also. So even
> if a sync queue's think time is with-in slice_idle limit, but this sync queue
> is seeky, then CFQ will disable idling on hardware supporting NCQ.
>
> Above is fine from throughput perspective but not necessarily from fairness
> perspective. In general CFQ seems to be inclined to favor throughput over
> fairness.
>
> How about introducing a CFQ ioscheduler tunable "fairness" which if set, will
> help CFQ to determine that user is interested in getting fairness right
> and will disable some of the hooks geared towards throughput.
>
> Two patches in this series introduce the tunable "fairness" and also do not
> disable the idling based on seek patterns if "fairness" is set.
>
> I ran four "dd" prio 0 BE class sequential readers on SATA disk.
>
> # Test script
> ionice -c 2 -n 0 dd if=/mnt/sdb/zerofile1
> ionice -c 2 -n 0 dd if=/mnt/sdb/zerofile2
> ionice -c 2 -n 0 dd if=/mnt/sdb/zerofile3
> ionice -c 2 -n 0 dd if=/mnt/sdb/zerofile4

> Normally one would expect that these processes should finish in almost similar
> time but following are the results of one of the runs (results vary between runs).

Actually, what you've written above would run each dd in sequence. I
get the idea, though.

> 234179072 bytes (234 MB) copied, 6.0338 s, 38.8 MB/s
> 234179072 bytes (234 MB) copied, 6.34077 s, 36.9 MB/s
> 234179072 bytes (234 MB) copied, 8.4014 s, 27.9 MB/s
> 234179072 bytes (234 MB) copied, 10.8469 s, 21.6 MB/s
>
> Different between first and last process finishing is almost 5 seconds (Out of
> total 10 seconds duration). This seems to be too big a variance.
>
> I ran the blktrace to find out what is happening, and it seems we are very
> quick to disable idling based mean seek distance. Somehow initial 7-10 reads

I submitted a patch to fix that, so maybe this isn't a problem anymore?
Here are my results, with fairness=0:

# cat test.sh
#!/bin/bash

ionice -c 2 -n 0 dd if=/mnt/test/testfile1 of=/dev/null count=524288 &
ionice -c 2 -n 0 dd if=/mnt/test/testfile2 of=/dev/null count=524288 &
ionice -c 2 -n 0 dd if=/mnt/test/testfile3 of=/dev/null count=524288 &
ionice -c 2 -n 0 dd if=/mnt/test/testfile4 of=/dev/null count=524288 &

wait

# bash test.sh
524288+0 records in
524288+0 records out
268435456 bytes (268 MB) copied, 10.3071 s, 26.0 MB/s
524288+0 records in
524288+0 records out
268435456 bytes (268 MB) copied, 10.3591 s, 25.9 MB/s
524288+0 records in
524288+0 records out
268435456 bytes (268 MB) copied, 10.4217 s, 25.8 MB/s
524288+0 records in
524288+0 records out
268435456 bytes (268 MB) copied, 10.4649 s, 25.7 MB/s

That looks pretty good to me.

Running a couple of fio workloads doesn't really show a difference
between a vanilla kernel and a patched cfq with fairness set to 1:

Vanilla:

total priority: 800
total data transferred: 887264
class prio ideal xferred %diff
be 4 110908 124404 12
be 4 110908 123380 11
be 4 110908 118004 6
be 4 110908 113396 2
be 4 110908 107252 -4
be 4 110908 98356 -12
be 4 110908 96244 -14
be 4 110908 106228 -5

Patched, with fairness set to 1:

total priority: 800
total data transferred: 953312
class prio ideal xferred %diff
be 4 119164 127028 6
be 4 119164 128244 7
be 4 119164 120564 1
be 4 119164 127476 6
be 4 119164 119284 0
be 4 119164 116724 -3
be 4 119164 103668 -14
be 4 119164 110324 -8

So, can you still reproduce this on your setup? I was just using a
boring SATA disk.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Dmitry Torokhov: "Re: Dell-laptop is not working without wireless"
Previous message: Cyrill Gorcunov: "Re: binfmt_flat.c && bprm->cred (Was: [PATCH 0/1] exec: do notsleep in TASK_TRACED under ->cred_guard_mutex)"
Next in thread: Vivek Goyal: "Re: [RFC] Improve CFQ fairness"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]