Re: [patch,rfc] cfq: merge cooperating cfq_queues

From: Corrado Zoccolo
Date: Wed Oct 21 2009 - 17:33:42 EST


Hi Jeff,
On Tue, Oct 20, 2009 at 8:23 PM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
> Hi,
>
> This is a follow-up patch to the original close cooperator support for
> CFQ. ÂThe problem is that some programs (NFSd, dump(8), iscsi target
> mode driver, qemu) interleave sequential I/Os between multiple threads
> or processes. ÂThe result is that there are large delays due to CFQs
> idling logic that leads to very low throughput. ÂThe original patch
> addresses these problems by detecting close cooperators and allowing
> them to jump ahead in the scheduling order. ÂThis doesn't work 100% of
> the time, unfortunately, and you can have some processes in the group
> getting way ahead (LBA-wise) of the others, leading to a lot of seeks.
>
> This patch addresses the problems in the current implementation by
> merging cfq_queue's of close cooperators. ÂThe results are encouraging:
>
I'm not sure that 3 broken userspace programs justify increasing the
complexity of a core kernel part as the I/O scheduler.
The original close cooperator code is not limited to those programs.
It can actually result in a better overall scheduling on rotating
media, since it can help with transient close relationships (and
should probably be disabled on non-rotating ones).
Merging queues, instead, can lead to bad results in case of false
positives. I'm thinking for examples to two programs that are loading
shared libraries (that are close on disk, being in the same dir) on
startup, and end up being tied to the same queue.
Can't the userspace programs be fixed to use the same I/O context for
their threads?
qemu already has a bug report for it
(https://bugzilla.redhat.com/show_bug.cgi?id=498242).

> read-test2 emulates the I/O patterns of dump(8). ÂThe following results
> are taken from 50 runs of patched, 16 runs of unpatched (I got impatient):
>
>        Average  Std. Dev.
> ----------------------------------
> Patched CFQ: Â 88.81773 Â0.9485
> Vanilla CFQ: Â 12.62678 Â0.24535
>
> Single streaming reader over NFS, results in MB/s are the average of 2
> runs.
>
> Â Â Â Â Â Â Â|patched|
> nfsd's| Âcfq Â| Âcfq Â| deadline
> ------+-------+-------+---------
> Â1 Â | Â45 Â | Â45 Â | 36
> Â2 Â | Â57 Â | Â60 Â | 60
> Â4 Â | Â38 Â | Â49 Â | 50
> Â8 Â | Â34 Â | Â40 Â | 49
> Â16 Â| Â34 Â | Â43 Â | 53
>
> The next step will be to break apart the cfqq's when the I/O patterns
> are no longer sequential. ÂThis is not very important for dump(8), but
> for NFSd, this could make a big difference. ÂThe problem with sharing
> the cfq_queue when the NFSd threads are no longer serving requests from
> a single client is that instead of having 8 scheduling entities, NFSd
> only gets one. ÂThis could considerably hurt performance when serving
> shares to multiple clients, though I don't have a test to show this yet.

I think it will hurt performance only if it is competing with other
I/O. In that case, having 8 scheduling entities will get 8 times more
disk share (but this can be fixed by adjusting the nfsd I/O priority).
For the I/O pattern, instead, sorting all requests in a single queue
may still be preferable, since they will be at least sorted in disk
order, instead of the random order given by which thread in the pool
received the request.
This is, though, an argument in favor of using CLONE_IO inside nfsd,
since having a single queue, with proper priority, will always give a
better overall performance.

Corrado

> So, please take this patch as an rfc, and any discussion on detecting
> that I/O patterns are no longer sequential at the cfqq level (not the
> cic, as multiple cic's now point to the same cfqq) would be helpful.
>
> Cheers,
> Jeff
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/