Re: [patch] blk-flush: fix flush policy calculation

From: Jeff Moyer
Date: Tue Aug 02 2011 - 13:40:00 EST


OK, sorry for top-posting here, but I chased the problem down further.

Commit ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae, block: reimplement
FLUSH/FUA to support merge, introduced a regression when running any
sort of fsyncing workload using dm-multipath and certain storage (in our
case, an HP EVA). It turns out that dm-multipath always advertised
flush+fua support, and passed commands on down the stack, where they
used to get stripped off. The above commit, unfortunately, changed that
behavior:

static inline struct request *__elv_next_request(struct request_queue *q)
{
struct request *rq;

while (1) {
- while (!list_empty(&q->queue_head)) {
+ if (!list_empty(&q->queue_head)) {
rq = list_entry_rq(q->queue_head.next);
- if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
- (rq->cmd_flags & REQ_FLUSH_SEQ))
- return rq;
- rq = blk_do_flush(q, rq);
- if (rq)
- return rq;
+ return rq;
}

Note that previously, a command would come in here, have
REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:

struct request *blk_do_flush(struct request_queue *q, struct request *rq)
{
unsigned int fflags = q->flush_flags; /* may change, cache it */
bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
REQ_FUA);
unsigned skip = 0;
...
if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
rq->cmd_flags &= ~REQ_FLUSH;
if (!has_fua)
rq->cmd_flags &= ~REQ_FUA;
return rq;
}

So, the flush machinery was bypassed in such cases (q->flush_flags == 0
&& rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).

Now, however, we don't get into the flush machinery at all (which is why
my initial patch didn't help this situation). Instead,
__elv_next_request just hands a request with flush and fua bits set to
the scsi_request_fn, even though the underlying request_queue does not
support flush or fua.

So, where do we fix this? We could just accept Mike's patch to not send
such requests down from dm-mpath, but that seems short-sighted. We
could reinstate some checks in __elv_next_request. Or, we could put the
checks into blk_insert_cloned_request.

Suggestions?

Cheers,
Jeff


Shaohua Li <shli@xxxxxxxxxx> writes:

> 2011/8/2 Jeff Moyer <jmoyer@xxxxxxxxxx>:
>> Hi,
>>
>> Reading through the code in blk-flush.c, it appears that there is an
>> oversight in the policy returned from blk_flush_policy:
>>
>> Â Â Â Âif (fflags & REQ_FLUSH) {
>> Â Â Â Â Â Â Â Âif (rq->cmd_flags & REQ_FLUSH)
>> Â Â Â Â Â Â Â Â Â Â Â Âpolicy |= REQ_FSEQ_PREFLUSH;
>> Â Â Â Â Â Â Â Âif (blk_rq_sectors(rq))
>> Â Â Â Â Â Â Â Â Â Â Â Âpolicy |= REQ_FSEQ_DATA;
>> Â Â Â Â Â Â Â Âif (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA))
>> Â Â Â Â Â Â Â Â Â Â Â Âpolicy |= REQ_FSEQ_POSTFLUSH;
>> Â Â Â Â}
>> Â Â Â Âreturn policy;
>>
>> This means that REQ_FSEQ_DATA can only be set if the queue flush_flags
>> include FLUSH and/or FUA. ÂHowever, the short-circuit for not issuing
>> flushes when the device doesn't need/support them depends on
>> REQ_FSEQ_DATA being set while the other two bits are clear:
>>
>> Â Â Â Â/*
>> Â Â Â Â * If there's data but flush is not necessary, the request can be
>> Â Â Â Â * processed directly without going through flush machinery. ÂQueue
>> Â Â Â Â * for normal execution.
>> Â Â Â Â */
>> Â Â Â Âif ((policy & REQ_FSEQ_DATA) &&
>> Â Â Â Â Â Â!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
>> Â Â Â Â Â Â Â Âlist_add_tail(&rq->queuelist, &q->queue_head);
>> Â Â Â Â Â Â Â Âreturn;
>> Â Â Â Â}
>>
>> Given the code as it stands, I don't think the body of this if statement
>> will ever be executed. ÂI've attached a fix for this below. ÂIt seems
>> like this could be both a performance and a correctness issue, though
>> I've not run into any problems I can directly attribute to this (perhaps
>> due to file systems not issuing flushes when support is not advertised?).
>>
>> Comments are appreciated.
>>
>> Cheers,
>> Jeff
>>
>> Signed-off-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
>>
>> diff --git a/block/blk-flush.c b/block/blk-flush.c
>> index bb21e4c..3a06118 100644
>> --- a/block/blk-flush.c
>> +++ b/block/blk-flush.c
>> @@ -95,11 +95,11 @@ static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq)
>> Â{
>> Â Â Â Âunsigned int policy = 0;
>>
>> + Â Â Â if (blk_rq_sectors(rq))
>> + Â Â Â Â Â Â Â policy |= REQ_FSEQ_DATA;
>> Â Â Â Âif (fflags & REQ_FLUSH) {
>> Â Â Â Â Â Â Â Âif (rq->cmd_flags & REQ_FLUSH)
>> Â Â Â Â Â Â Â Â Â Â Â Âpolicy |= REQ_FSEQ_PREFLUSH;
>> - Â Â Â Â Â Â Â if (blk_rq_sectors(rq))
>> - Â Â Â Â Â Â Â Â Â Â Â policy |= REQ_FSEQ_DATA;
>> Â Â Â Â Â Â Â Âif (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA))
>> Â Â Â Â Â Â Â Â Â Â Â Âpolicy |= REQ_FSEQ_POSTFLUSH;
>> Â Â Â Â}
>> --
> __generic_make_request always handles this:
> if ((bio->bi_rw & (REQ_FLUSH | REQ_FUA)) && !q->flush_flags) {
> bio->bi_rw &= ~(REQ_FLUSH | REQ_FUA);
> if (!nr_sectors) {
> err = 0;
> goto end_io;
> }
> }
>
> Thanks,
> Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/