Re: Possible deadlock detected in Linux 6.2.0 in dm_get_inactive_table (dm-ioctl.c)

From: Mike Snitzer
Date: Mon Apr 17 2023 - 12:21:08 EST


On Mon, Apr 17 2023 at 1:08P -0400,
Zheng Zhang <zheng.zhang@xxxxxxxxxxxxx> wrote:

> Alasdir, Mike, and to whom it may concern:
>
> Hello! We have found a bug in the Linux kernel version 6.2.0 by syzkaller
> with our own templates. The bug causes a possible recursive locking
> scenario, resulting in a deadlock.
> The key trace is as follows (the complete trace is in the attached report
> file):
>
> down_read+0x9d/0x450 kernel/locking/rwsem.c:1509
>
> dm_get_inactive_table+0x2b/0xc0 drivers/md/dm-ioctl.c:773
>
> __dev_status+0x4fd/0x7c0 drivers/md/dm-ioctl.c:844
> table_clear+0x197/0x280 drivers/md/dm-ioctl.c:1537
>
> In table_clear, it acquires a *write lock*
> https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L1520
> down_write(&_hash_lock);
>
> Then before the lock is released at L1539, there is a path shown above:
> table_clear -> __dev_status -> dm_get_inactive_table -> down_read
> https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L773
> down_read(&_hash_lock);
> It tries to acquire* the same read lock* again, resulting in the deadlock
> problem
>
> Attached is the report, log, and reproducers generated by syzkaller
> Please let me know if there is any additional information that I can
> provide to help debug this issue.
> Thanks!

Thanks for the report, I've staged this fix:

From: Mike Snitzer <snitzer@xxxxxxxxxx>
Subject: [PATCH] dm ioctl: fix nested locking in table_clear() to remove
deadlock concern

syzkaller found the following problematic rwsem locking (with write
lock already held):

down_read+0x9d/0x450 kernel/locking/rwsem.c:1509
dm_get_inactive_table+0x2b/0xc0 drivers/md/dm-ioctl.c:773
__dev_status+0x4fd/0x7c0 drivers/md/dm-ioctl.c:844
table_clear+0x197/0x280 drivers/md/dm-ioctl.c:1537

In table_clear, it first acquires a write lock
https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L1520
down_write(&_hash_lock);

Then before the lock is released at L1539, there is a path shown above:
table_clear -> __dev_status -> dm_get_inactive_table -> down_read
https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L773
down_read(&_hash_lock);

It tries to acquire the same read lock again, resulting in the deadlock
problem.

Fix this by moving table_clear()'s __dev_status() call to after its
up_write(&_hash_lock);

Cc: stable@xxxxxxxxxxxxxxx
Reported-by: Zheng Zhang <zheng.zhang@xxxxxxxxxxxxx>
Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
---
drivers/md/dm-ioctl.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 50a1259294d1..7d5c9c582ed2 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1556,11 +1556,12 @@ static int table_clear(struct file *filp, struct dm_ioctl *param, size_t param_s
has_new_map = true;
}

- param->flags &= ~DM_INACTIVE_PRESENT_FLAG;
-
- __dev_status(hc->md, param);
md = hc->md;
up_write(&_hash_lock);
+
+ param->flags &= ~DM_INACTIVE_PRESENT_FLAG;
+ __dev_status(md, param);
+
if (old_map) {
dm_sync_table(md);
dm_table_destroy(old_map);
--
2.40.0