[patch|rfc] block: fix BLK_MQ_F_SG_MERGE support

From: Jeff Moyer
Date: Mon Aug 04 2014 - 18:00:04 EST


Hi Jens,

With device mapper targets on top of a device that does not have
BLK_MQ_F_SG_MERGE (the default), we can end up sending down more segments
than a driver is prepared to handle. I ran into this with virtio_blk,
triggerring this BUG_ON:

BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);

The queue's max is set here:
blk_queue_max_segments(q, vblk->sg_elems-2);

Basically, what happens is that a bio is built up for the dm device
(which does not have the BLK_MQ_F_SG_MERGE flag set) using
bio_add_page. That path will call into __blk_recalc_rq_segments, so
what you end up with is bi_phys_segments being much smaller than
bi_vcnt.

And then you clone the bio. When the cloned bio is submitted, it will
end up in blk_recount_segments, here:

if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
bio->bi_phys_segments = bio->bi_vcnt;

and now we've set bio->bi_phys_segments to a number that is beyond what
was registered as queue_max_segments by the driver.

I'm not sure how you want to address this. The attached patch works,
but it doesn't feel right to me. Probably the right thing to do is not
do the merging for the upper level device, but we don't have a mechanism
for propagating queue flags up the stack as yet.

Note that I ran into this when backporting the virtio_blk driver. I
could not reproduce this problem with an upstream kernel, but from the
looks of the code, it's possible to hit it.

Also note that setting BLK_MQ_F_SG_MERGE for the virtio blk device works
around the problem, as you'd expect.

Cheers,
Jeff

p.s. no S-O-B as I don't think this is the right fix

diff --git a/block/blk-merge.c b/block/blk-merge.c
index b3bf0df..af2f472 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -93,7 +93,8 @@ void blk_recalc_rq_segments(struct request *rq)

void blk_recount_segments(struct request_queue *q, struct bio *bio)
{
- if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
+ if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags) &&
+ !bio_flagged(bio, BIO_CLONED))
bio->bi_phys_segments = bio->bi_vcnt;
else {
struct bio *nxt = bio->bi_next;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/