Re: [syzbot] WARNING in vhost_dev_cleanup (2)

From: Stefano Garzarella
Date: Mon Feb 21 2022 - 05:44:44 EST


On Fri, Feb 18, 2022 at 12:23:10PM -0600, Mike Christie wrote:
On 2/18/22 11:53 AM, Mike Christie wrote:
On 2/17/22 3:48 AM, Stefano Garzarella wrote:

On Thu, Feb 17, 2022 at 8:50 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:

On Thu, Feb 17, 2022 at 03:39:48PM +0800, Jason Wang wrote:
On Thu, Feb 17, 2022 at 3:36 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:

On Thu, Feb 17, 2022 at 03:34:13PM +0800, Jason Wang wrote:
On Thu, Feb 17, 2022 at 10:01 AM syzbot
<syzbot+1e3ea63db39f2b4440e0@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Hello,

syzbot found the following issue on:

HEAD commit: c5d9ae265b10 Merge tag 'for-linus' of git://git.kernel.org..
git tree: upstream
console output: https://urldefense.com/v3/__https://syzkaller.appspot.com/x/log.txt?x=132e687c700000__;!!ACWV5N9M2RV99hQ!fLqQTyosTBm7FK50IVmo0ozZhsvUEPFCivEHFDGU3GjlAHDWl07UdOa-t9uf9YisMihn$
kernel config: https://urldefense.com/v3/__https://syzkaller.appspot.com/x/.config?x=a78b064590b9f912__;!!ACWV5N9M2RV99hQ!fLqQTyosTBm7FK50IVmo0ozZhsvUEPFCivEHFDGU3GjlAHDWl07UdOa-t9uf9RjOhplp$
dashboard link: https://urldefense.com/v3/__https://syzkaller.appspot.com/bug?extid=1e3ea63db39f2b4440e0__;!!ACWV5N9M2RV99hQ!fLqQTyosTBm7FK50IVmo0ozZhsvUEPFCivEHFDGU3GjlAHDWl07UdOa-t9uf9bBf5tv0$
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+1e3ea63db39f2b4440e0@xxxxxxxxxxxxxxxxxxxxxxxxx

WARNING: CPU: 1 PID: 10828 at drivers/vhost/vhost.c:715 vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715
Modules linked in:
CPU: 0 PID: 10828 Comm: syz-executor.0 Not tainted 5.17.0-rc4-syzkaller-00051-gc5d9ae265b10 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:vhost_dev_cleanup+0x8b8/0xbc0 drivers/vhost/vhost.c:715

Probably a hint that we are missing a flush.

Looking at vhost_vsock_stop() that is called by vhost_vsock_dev_release():

static int vhost_vsock_stop(struct vhost_vsock *vsock)
{
size_t i;
int ret;

mutex_lock(&vsock->dev.mutex);

ret = vhost_dev_check_owner(&vsock->dev);
if (ret)
goto err;

Where it could fail so the device is not actually stopped.

I wonder if this is something related.

Thanks


But then if that is not the owner then no work should be running, right?

Could it be a buggy user space that passes the fd to another process
and changes the owner just before the mutex_lock() above?

Thanks

Maybe, but can you be a bit more explicit? what is the set of
conditions you see that can lead to this?

I think the issue could be in the vhost_vsock_stop() as Jason mentioned,
but not related to fd passing, but related to the do_exit() function.

Looking the stack trace, we are in exit_task_work(), that is called
after exit_mm(), so the vhost_dev_check_owner() can fail because
current->mm should be NULL at that point.

It seems the fput work is queued by fput_many() in a worker queue, and
in some cases (maybe a lot of files opened?) the work is still queued
when we enter in do_exit().
It normally happens if userspace doesn't do a close() when the VM

Just one clarification. I meant to say it "always" happens when userspace
doesn't do a close.

It doesn't have anything to do with lots of files or something like that.
We are actually running the vhost device's release function from
do_exit->task_work_run and so all those __fputs are done from something
like qemu's context (current == that process).

We are *not* hitting the case:

do_exit->exit_files->put_files_struct->filp_close->fput->fput_many

and then in there hitting the schedule_delayed_work path. For that
the last __fput would be done from a workqueue thread and so the current
pointer would point to a completely different thread.



is shutdown and instead let's the kernel's reaper code cleanup. The qemu
vhost-scsi code doesn't do a close() during shutdown and so this is our
normal code path. It also happens when something like qemu is not
gracefully shutdown like during a crash.

So fire up qemu, start IO, then crash it or kill 9 it while IO is still
running and you can hit it.

Thank you very much for this detailed explanation!



That said, I don't know if we can simply remove that check in
vhost_vsock_stop(), or check if current->mm is NULL, to understand if
the process is exiting.


Should the caller do the vhost_dev_check_owner or tell vhost_vsock_stop
when to check?

- vhost_vsock_dev_ioctl always wants to check for ownership right?

- For vhost_vsock_dev_release ownership doesn't matter because we
always want to clean up or it doesn't hurt too much.

For the case where we just do open then close and no ioctls then
running vhost_vq_set_backend in vhost_vsock_stop is just a minor
hit of extra work. If we've done ioctls, but are now in
vhost_vsock_dev_release then we know for the graceful and ungraceful
case that nothing is going to be accessing this device in the future
and it's getting completely freed so we must completely clean it up.

Yep, I think the easiest way is to add a parameter to vhost_vsock_stop() to tell when to call vhost_dev_check_owner() or not. This is because dev->mm is protected by dev->mutex, acquired in vhost_vsock_stop().

I will send a patch right away, it would be great if you can review.

Thanks,
Stefano