Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

From: Jens Axboe
Date: Wed Sep 17 2014 - 10:22:36 EST


On 2014-09-17 07:52, Ming Lei wrote:
On Wed, 17 Sep 2014 14:00:34 +0200
David Hildenbrand <dahi@xxxxxxxxxxxxxxxxxx> wrote:

Does anyone have an idea?
The request itself is completely filled with cc

That is very weird, the 'rq' is got from hctx->tags, and rq should be
valid, and rq->q shouldn't have been changed even though it was
double free or double allocation.

I am currently asking myself if blk_mq_map_request should protect against softirq here but I cant say for sure,as I have never looked into that code before.

No, it needn't the protection.

Thanks,


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html



Digging through the code, I think I found a possible cause:

tags->rqs[..] is not initialized with zeroes (via alloc_pages_node in
blk-mq.c:blk_mq_init_rq_map()).

Yes, it may cause problem when the request is allocated at the 1st time,
and timeout handler may comes just after the allocation and before its
initialization, then oops triggered because of garbage data in the request.

--
From ffd0824b7b686074c2d5d70bc4e6bba3ba56a30c Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@xxxxxxxxxxxxx>
Date: Wed, 17 Sep 2014 21:00:34 +0800
Subject: [PATCH] blk-mq: initialize request before the 1st allocation

Otherwise the request can be accessed from timeout handler
just after its 1st allocation from tag pool and before
initialization in blk_mq_rq_ctx_init(), so cause oops since
the request is filled up with garbage data.

Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxxxxx>
---
block/blk-mq.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4aac826..d24673f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -514,6 +514,10 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
{
struct request *rq = tags->rqs[tag];

+ /* uninitialized request */
+ if (!rq->q || rq->tag == -1)
+ return rq;
+
if (!is_flush_request(rq, tag))
return rq;

@@ -1401,6 +1405,12 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct blk_mq_tag_set *set,
left -= to_do * rq_size;
for (j = 0; j < to_do; j++) {
tags->rqs[i] = p;
+
+ /* Avoiding early access from timeout handler */
+ tags->rqs[i]->tag = -1;
+ tags->rqs[i]->q = NULL;
+ tags->rqs[i]->cmd_flags = 0;
+
if (set->ops->init_request) {
if (set->ops->init_request(set->driver_data,
tags->rqs[i], hctx_idx, i,

Another way would be to ensure that the timeout handler doesn't touch hw_ctx or tag_sets that aren't fully initialized yet. But I think this is safer/cleaner.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/