Re: [PATCH V6 4/5] blk-mq-sched: improve dispatching from sw queue

From: Ming Lei
Date: Thu Oct 12 2017 - 11:49:51 EST


On Thu, Oct 12, 2017 at 09:37:11AM -0600, Jens Axboe wrote:
> On 10/12/2017 09:33 AM, Bart Van Assche wrote:
> > On Thu, 2017-10-12 at 18:01 +0800, Ming Lei wrote:
> >> Even EWMA approach isn't good on SCSI-MQ too, because
> >> some SCSI's .cmd_per_lun is very small, such as 3 on
> >> lpfc and qla2xxx, and one full flush will trigger
> >> BLK_STS_RESOURCE easily.
> >>
> >> So I suggest to use the way of q->queue_depth first, since we
> >> don't get performance degrade report on other devices(!q->queue_depth)
> >> with blk-mq. We can improve this way in the future if we
> >> have better approach.
> >
> > Measurements have shown that even with this patch series applied sequential
> > I/O performance is still below that of the legacy block and SCSI layers. So
> > this patch series is not the final solution. (See also John Garry's e-mail
> > of October 10th - https://lkml.org/lkml/2017/10/10/401). I have been
> > wondering what could be causing that performance difference. Maybe it's
> > because requests can reside for a while in the hctx dispatch queue and hence
> > are unvisible for the scheduler while in the hctx dispatch queue? Should we
> > modify blk_mq_dispatch_rq_list() such that it puts back requests that have
> > not been accepted by .queue_rq() onto the scheduler queue(s) instead of to
> > the hctx dispatch queue? If we would make that change, would it allow us to
> > drop patch "blk-mq-sched: improve dispatching from sw queue"?
>
> Yes, it's clear that even with the full series, we're not completely there
> yet. We are closer, though, and I do want to close that gap up as much
> as we can. I think everybody will be more motivated and have an easier time
> getting the last bit of the way there, once we have a good foundation in.
>
> It may be the reason that you hint at, if we do see a lot of requeueing
> or BUSY in the test case. That would prematurely move requests from the
> schedulers knowledge and into the hctx->dispatch holding area. It'd be
> useful to have a standard SATA test run and see if we're missing merging
> in that case (since merging is what it boils down to). If we are, then
> it's not hctx->dispatch issues.

>From Gary's test result on the patches of .get_budget()/.put_budget()[1],
the sequential I/O performance is still not good, that means the
issue may not be in IO merge, because .get_buget/.put_budget is
more helpful to do I/O merge than block legacy.

Actually in my virtio-scsi test, blk-mq has been better than block legacy
with the way of .get_budget()/.put_budget().


[1] https://github.com/ming1/linux/commits/blk_mq_improve_scsi_mpath_perf_V6.2_test


--
Ming