Re: [PATCH v3] misc: bcm_vk: Fix potential deadlock on &vk->ctx_lock

From: Ray Jui
Date: Wed Jul 19 2023 - 13:31:31 EST




On 7/19/2023 10:07 AM, Scott Branden wrote:
> Works fine - thanks.

So apparently the choice of using the timer previously was not due to
performance reasons?

If performance is a concern by converting to use workqueue (now it runs
in process/thread context than softirq), I assume you are aware of
another easy way to fix this potential deadlock issue? :)
>
> On 2023-06-29 11:29, Chengfeng Ye wrote:
>> As &vk->ctx_lock is acquired by timer bcm_vk_hb_poll() under softirq
>> context, other process context code should disable irq or bottom-half
>> before acquire the same lock, otherwise deadlock could happen if the
>> timer preempt the execution while the lock is held in process context
>> on the same CPU.
>>
>> Possible deadlock scenario
>> bcm_vk_open()
>>      -> bcm_vk_get_ctx()
>>      -> spin_lock(&vk->ctx_lock)
>>     <timer iterrupt>
>>     -> bcm_vk_hb_poll()
>>     -> bcm_vk_blk_drv_access()
>>     -> spin_lock_irqsave(&vk->ctx_lock, flags) (deadlock here)
>>
>> This flaw was found using an experimental static analysis tool we are
>> developing for irq-related deadlock, which reported the following
>> warning when analyzing the linux kernel 6.4-rc7 release.
>>
>> [Deadlock]: &vk->ctx_lock
>>    [Interrupt]: bcm_vk_hb_poll
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>    [Locking Unit]: bcm_vk_ioctl
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1181
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>
>> [Deadlock]: &vk->ctx_lock
>>    [Interrupt]: bcm_vk_hb_poll
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>    [Locking Unit]: bcm_vk_ioctl
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:1169
>>
>> [Deadlock]: &vk->ctx_lock
>>    [Interrupt]: bcm_vk_hb_poll
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>    [Locking Unit]: bcm_vk_open
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:216
>>
>> [Deadlock]: &vk->ctx_lock
>>    [Interrupt]: bcm_vk_hb_poll
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:176
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_dev.c:512
>>    [Locking Unit]: bcm_vk_release
>>      -->/root/linux/drivers/misc/bcm-vk/bcm_vk_msg.c:306
>>
>> As suggested by Arnd, the tentative patch fix the potential deadlocks
>> by replacing the timer with delay workqueue. x86_64 allyesconfig using
>> GCC shows no new warning. Note that no runtime testing was performed
>> due to no device on hand.
>>
>> Signed-off-by: Chengfeng Ye <dg573847474@xxxxxxxxx>
> Acked-by: Scott Branden <scott.branden@xxxxxxxxxxxx>
> Tested-by: Desmond Yan <desmond.branden@xxxxxxxxxxxx>
>
>> ---
>>   drivers/misc/bcm-vk/bcm_vk.h     |  2 +-
>>   drivers/misc/bcm-vk/bcm_vk_msg.c | 14 +++++++-------
>>   2 files changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/misc/bcm-vk/bcm_vk.h b/drivers/misc/bcm-vk/bcm_vk.h
>> index 25d51222eedf..386884c2a263 100644
>> --- a/drivers/misc/bcm-vk/bcm_vk.h
>> +++ b/drivers/misc/bcm-vk/bcm_vk.h
>> @@ -340,7 +340,7 @@ struct bcm_vk_proc_mon_info {
>>   };
>>     struct bcm_vk_hb_ctrl {
>> -    struct timer_list timer;
>> +    struct delayed_work work;
>>       u32 last_uptime;
>>       u32 lost_cnt;
>>   };
>> diff --git a/drivers/misc/bcm-vk/bcm_vk_msg.c
>> b/drivers/misc/bcm-vk/bcm_vk_msg.c
>> index 3c081504f38c..e17d81231ea6 100644
>> --- a/drivers/misc/bcm-vk/bcm_vk_msg.c
>> +++ b/drivers/misc/bcm-vk/bcm_vk_msg.c
>> @@ -137,11 +137,11 @@ void bcm_vk_set_host_alert(struct bcm_vk *vk,
>> u32 bit_mask)
>>   #define BCM_VK_HB_TIMER_VALUE (BCM_VK_HB_TIMER_S * HZ)
>>   #define BCM_VK_HB_LOST_MAX (27 / BCM_VK_HB_TIMER_S)
>>   -static void bcm_vk_hb_poll(struct timer_list *t)
>> +static void bcm_vk_hb_poll(struct work_struct *work)
>>   {
>>       u32 uptime_s;
>> -    struct bcm_vk_hb_ctrl *hb = container_of(t, struct bcm_vk_hb_ctrl,
>> -                         timer);
>> +    struct bcm_vk_hb_ctrl *hb = container_of(to_delayed_work(work),
>> struct bcm_vk_hb_ctrl,
>> +                         work);
>>       struct bcm_vk *vk = container_of(hb, struct bcm_vk, hb_ctrl);
>>         if (bcm_vk_drv_access_ok(vk) && hb_mon_is_on()) {
>> @@ -177,22 +177,22 @@ static void bcm_vk_hb_poll(struct timer_list *t)
>>           bcm_vk_set_host_alert(vk, ERR_LOG_HOST_HB_FAIL);
>>       }
>>       /* re-arm timer */
>> -    mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
>> +    schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
>>   }
>>     void bcm_vk_hb_init(struct bcm_vk *vk)
>>   {
>>       struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;
>>   -    timer_setup(&hb->timer, bcm_vk_hb_poll, 0);
>> -    mod_timer(&hb->timer, jiffies + BCM_VK_HB_TIMER_VALUE);
>> +    INIT_DELAYED_WORK(&hb->work, bcm_vk_hb_poll);
>> +    schedule_delayed_work(&hb->work, BCM_VK_HB_TIMER_VALUE);
>>   }
>>     void bcm_vk_hb_deinit(struct bcm_vk *vk)
>>   {
>>       struct bcm_vk_hb_ctrl *hb = &vk->hb_ctrl;
>>   -    del_timer(&hb->timer);
>> +    cancel_delayed_work_sync(&hb->work);
>>   }
>>     static void bcm_vk_msgid_bitmap_clear(struct bcm_vk *vk,

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature