Re: [syzbot] kernel BUG in vhost_get_vq_desc

From: Michael S. Tsirkin
Date: Mon Feb 21 2022 - 08:59:00 EST


On Mon, Feb 21, 2022 at 09:00:22PM +0800, Hillf Danton wrote:
> On Mon, 21 Feb 2022 05:48:48 -0500 Michael S. Tsirkin wrote:
> > On Mon, Feb 21, 2022 at 06:15:38PM +0800, Hillf Danton wrote:
> > > On Mon, 21 Feb 2022 04:17:02 -0500 Michael S. Tsirkin wrote:
> > > > On Mon, Feb 21, 2022 at 04:52:27PM +0800, Hillf Danton wrote:
> > > > > Another round of attempts to quiesce the
> > > > > WARNING: CPU: 1 PID: 4069 at drivers/vhost/vhost.c:715 after the
> > > > > BUG at drivers/vhost/vhost.c:2337 went home.
> > > >
> > > > Could you pls clarify what do you mean by "went home" here?
> > >
> > > The reproducer failed to trigger it.
> > >
> > > Hillf
> >
> > You mean this patch?
>
> No, it is part of the first round.
> >
> > @@ -2207,7 +2209,10 @@ int vhost_get_vq_desc(struct vhost_virtq
> > __virtio16 avail_idx;
> > __virtio16 ring_head;
> > int ret, access;
> > + bool was_set = !!(vq->used_flags & VRING_USED_F_NO_NOTIFY);
> >
> > + if (!was_set)
> > + return -EINVAL;
> > /* Check it isn't doing very strange things with descriptor numbers. */
> > last_avail_idx = vq->last_avail_idx;
> >
> >
> > However, I do not understand how do we enter vhost_get_vq_desc
> > with vq->used_flags & VRING_USED_F_NO_NOTIFY being clear.
> > Do you?
>
> The diff below turned BUG in to WARNING, and you can see it in one of the
> mails in your inbox as you are on the Cc list.

Right. So it's not a fix, it's just a work around, and we still need to
understand how we can get into this state.

> Hillf
> ---<---
>
> The re-trigger of the BUG_ON sends us to the start point and looks like it
> could not be solved without a mind refresh.

I don't understand this sentence btw. How does BUG_ON send us to the
start point? what is the start point? and what is a mind refresh?

> Add a flag to vsock and set it before work flush upon release, and no more
> works will be queued with it turned on.
>
> Hillf
>
> >>#syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ f71077a4d84b
>
> --- x/drivers/vhost/vsock.c
> +++ y/drivers/vhost/vsock.c
> @@ -55,6 +55,7 @@ struct vhost_vsock {
> struct list_head send_pkt_list; /* host->guest pending packets */
>
> atomic_t queued_replies;
> + int cleanup;
>
> u32 guest_cid;
> bool seqpacket_allow;
> @@ -262,6 +263,9 @@ vhost_transport_do_send_pkt(struct vhost
> out:
> mutex_unlock(&vq->mutex);
>
> + if (vsock->cleanup)
> + return;
> +
> if (restart_tx)
> vhost_poll_queue(&tx_vq->poll);
> }
> @@ -678,6 +682,7 @@ static int vhost_vsock_dev_open(struct i
> }
>
> vsock->guest_cid = 0; /* no CID assigned yet */
> + vsock->cleanup = 0;
>
> atomic_set(&vsock->queued_replies, 0);
>
> @@ -741,6 +746,8 @@ static int vhost_vsock_dev_release(struc
> {
> struct vhost_vsock *vsock = file->private_data;
>
> + vsock->cleanup = 1;
> +
> mutex_lock(&vhost_vsock_mutex);
> if (vsock->guest_cid)
> hash_del_rcu(&vsock->hash);
> --