Re: [PATCH block:for-3.3/core] cfq: merged request shouldn't jumpto a different cfqq

From: Shaohua Li
Date: Thu Jan 05 2012 - 21:59:43 EST


On Thu, 2012-01-05 at 18:36 -0800, Tejun Heo wrote:
> Hello, again.
>
> On Thu, Jan 05, 2012 at 06:17:07PM -0800, Tejun Heo wrote:
> > When two requests are merged, if the absorbed request is older than
> > the absorbing one, cfq_merged_requests() tries to reposition it in the
> > cfqq->fifo list by list_move()'ing the absorbing request to the
> > absorbed one before removing it.
> >
> > This works if both requests are on the same cfqq but nothing
> > guarantees that and the code ends up moving the merged request to a
> > different cfqq's fifo list without adjusting the rest. This leads to
> > the following failures.
> >
> > * A request may be on the fifo list of a cfqq without holding
> > reference to it and the cfqq can be freed before requst is finished.
> > Among other things, this triggers list debug warning and slab debug
> > use-after-free warning.
> >
> > * As a request can be on the wrong fifo queue, it may be issued and
> > completed before its cfqq is scheduled. If the cfqq didn't have
> > other requests on it, it would be empty by the time it's dispatched
> > triggering BUG_ON() in cfq_dispatch_request().
> >
> > Fix it by making cfq_merged_requests() scan the absorbing request's
> > fifo list for the correct slot and move there instead.
>
> Hmmm... while the patch would fix the problem. It isn't entirely
> correct. The root cause is,
>
> 1. q->last_merge and rqhash used to be used only for merging bios into
> requests and that queries elevator whether the merge should be
> allowed. cfq disallows merging if they belong to different cfqqs.
>
> 2. request-request merging didn't use to use q->last_merge or rqhash to
> find request candidates. It used elv_former/latter_request() and
> cfq never returned request from a different cfqq.
>
> 3. Plug merging started using q->last_merge and rqhash and now
> elevator can't prevent cross cfqq merges.
>
> So, yeah, the right fix would be using elv_former/latter_request()
> instead. Maybe we should strip out rqhash altogether and change
> elevator handle everything? I don't know. I'll prepare a different
> fix patch soon.
So not allow merge from two cfq queues strictly? This will impact
performance. I don't know how important the strict isolation is. we even
allow two cfq queues merge to improve performance.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/