Re: [RESEND PATCH] blk-mq: fix hang caused by freeze/unfreeze sequence

From: Bob Liu
Date: Fri Apr 12 2019 - 20:37:11 EST


On 4/9/19 5:29 PM, Jinpu Wang wrote:
> Bob Liu <bob.liu@xxxxxxxxxx> ä2019å4æ9æåä äå11:11åéï
>>
>> This patch was proposed by Roman Pen[3] years ago.
>> Recently we hit a bug which is likely caused by the same reason,so rebased his
>> fix to v5.1 and resend.
>> Below is almost copied from that patch[3].
>>
>> ------
>> Long time ago there was a similar fix proposed by Akinobu Mita[1],
>> but it seems that time everyone decided to fix this subtle race in
>> percpu-refcount and Tejun Heo[2] did an attempt (as I can see that
>> patchset was not applied).
>>
>> The following is a description of a hang in blk_mq_freeze_queue_wait() -
>> same fix but a bug from another angle.
>>
>> The hang happens on attempt to freeze a queue while another task does
>> queue unfreeze.
>>
>> The root cause is an incorrect sequence of percpu_ref_reinit() and
>> percpu_ref_kill() and as a result those two can be swapped:
>>
>> CPU#0 CPU#1
>> ---------------- -----------------
>> percpu_ref_kill()
>>
>> percpu_ref_kill() << atomic reference does
>> percpu_ref_reinit() << not guarantee the order
>>
>> blk_mq_freeze_queue_wait() << HANG HERE
>>
>> percpu_ref_reinit()
>>
>> Firstly this wrong sequence raises two kernel warnings:
>>
>> 1st. WARNING at lib/percpu-recount.c:309
>> percpu_ref_kill_and_confirm called more than once
>>
>> 2nd. WARNING at lib/percpu-refcount.c:331
>>
>> But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(),
>> which waits for a zero of a q_usage_counter, which never happens
>> because percpu-ref was reinited (instead of being killed) and stays in
>> PERCPU state forever.
>>
>> The simplified sequence above can be reproduced on shared tags, when
>> queue A is going to die meanwhile another queue B is in init state and
>> is trying to freeze the queue A, which shares the same tags set:
>>
>> CPU#0 CPU#1
>> ------------------------------- ------------------------------------
>> q1 = blk_mq_init_queue(shared_tags)
>>
>> q2 = blk_mq_init_queue(shared_tags):
>> blk_mq_add_queue_tag_set(shared_tags):
>> blk_mq_update_tag_set_depth(shared_tags):
>> blk_mq_freeze_queue(q1)
>> blk_cleanup_queue(q1) ...
>> blk_mq_freeze_queue(q1) <<<->>> blk_mq_unfreeze_queue(q1)
>>
>> [1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@xxxxxxxxx
>> [2] Message id: 1443563240-29306-6-git-send-email-tj@xxxxxxxxxx
>> [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_9268199_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=1ktT0U2YS_I8Zz2o-MS1YcCAzWZ6hFGtyTgvVMGM7gI&m=OcA07QqFechuCug2pqm_-JpGP_mOt0YouTXApdePMGw&s=VM_-8S5gkFo8zUjT5RoY0CkbxN6hQmTwVmslulwsFJM&e=
>>
>> Signed-off-by: Roman Pen <roman.penyaev@xxxxxxxxxxxxxxxx>
>> Signed-off-by: Bob Liu <bob.liu@xxxxxxxxxx>
>> Cc: Akinobu Mita <akinobu.mita@xxxxxxxxx>
>> Cc: Tejun Heo <tj@xxxxxxxxxx>
>> Cc: Jens Axboe <axboe@xxxxxxxxx>
>> Cc: Christoph Hellwig <hch@xxxxxx>
>> Cc: linux-block@xxxxxxxxxxxxxxx
>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>>
>
> Replaced Roman's email address.
>
> We at 1 & 1 IONOS (former ProfitBricks) have been carried this patch
> for some years,
> it has been running in production for some years too,

Nice to hear that!

> would be good to see it in upstream :)

Yes.
Could anyone have a review? Thanks!

>
> Thanks,
>
> Jack Wang
> Linux Kernel Developer @ 1 & 1 IONOS
>