Re: [syzbot] BUG: corrupted list in p9_fd_cancel (2)

From: Christian Schoenebeck
Date: Sun Oct 23 2022 - 12:10:12 EST


On Sunday, October 23, 2022 12:41:34 PM CEST syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: d47136c28015 Merge tag 'hwmon-for-v6.1-rc2' of git://git.k..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12f36de2880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=4789759e8a6d5f57
> dashboard link: https://syzkaller.appspot.com/bug?extid=9b69b8d10ab4a7d88056
> compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1076cb7c880000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=102eabd2880000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/5664e231e97f/disk-d47136c2.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/9bbe0daa4a04/vmlinux-d47136c2.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+9b69b8d10ab4a7d88056@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> list_del corruption, ffff88802295c4b0->next is LIST_POISON1 (dead000000000100)
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:55!
[...]
> Call Trace:
> <TASK>
> __list_del_entry include/linux/list.h:134 [inline]
> list_del include/linux/list.h:148 [inline]
> p9_fd_cancel+0x9c/0x230 net/9p/trans_fd.c:703

I only had a short cycle on this yet: so the problem is that the req_list list
head is removed twice, which triggers this warning from [lib/list_debug.c].

Probably moving spin_unlock() call back down to the end of function
p9_conn_cancel() might fix this:

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 56a186768750..409f0da70c52 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -207,8 +207,6 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
list_move(&req->req_list, &cancel_list);
}

- spin_unlock(&m->req_lock);
-
list_for_each_entry_safe(req, rtmp, &cancel_list, req_list) {
p9_debug(P9_DEBUG_ERROR, "call back req %p\n", req);
list_del(&req->req_list);
@@ -216,6 +214,8 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
req->t_err = err;
p9_client_cb(m->client, req, REQ_STATUS_ERROR);
}
+
+ spin_unlock(&m->req_lock);
}

static __poll_t

spin_unlock() was recently moved up a bit to fix a dead lock, however that
dead lock happened with a lock on client level, meanwhile it was converted
into a lock on connection level.

The question is whether that would fix this for good and not just move it,
because there are a bunch of list removal calls that don't check for the
request state or something to prevent a double removal at other places.

Best regards,
Christian Schoenebeck