Re: [PATCH] virtio_net: Fix probe failed when modprobe virtio_net

From: Michael S. Tsirkin
Date: Mon Nov 21 2022 - 12:47:17 EST


On Mon, Nov 21, 2022 at 09:29:35PM +0800, Li Zetao wrote:
> When doing the following test steps, an error was found:
> step 1: modprobe virtio_net succeeded
> # modprobe virtio_net <-- OK
>
> step 2: fault injection in register_netdevice()
> # modprobe -r virtio_net <-- OK
> # ...
> FAULT_INJECTION: forcing a failure.
> name failslab, interval 1, probability 0, space 0, times 0
> CPU: 0 PID: 3521 Comm: modprobe
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> Call Trace:
> <TASK>
> ...
> should_failslab+0xa/0x20
> ...
> dev_set_name+0xc0/0x100
> netdev_register_kobject+0xc2/0x340
> register_netdevice+0xbb9/0x1320
> virtnet_probe+0x1d72/0x2658 [virtio_net]
> ...
> </TASK>
> virtio_net: probe of virtio0 failed with error -22
>
> step 3: modprobe virtio_net failed
> # modprobe virtio_net <-- failed
> virtio_net: probe of virtio0 failed with error -2
>
> The root cause of the problem is that the queues are not
> disable on the error handling path when register_netdevice()
> fails in virtnet_probe(), resulting in an error "-ENOENT"
> returned in the next modprobe call in setup_vq().
>
> virtio_pci_modern_device uses virtqueues to send or
> receive message, and "queue_enable" records whether the
> queues are available. In vp_modern_find_vqs(), all queues
> will be selected and activated, but once queues are enabled
> there is no way to go back except reset.
>
> Fix it by reset virtio device on error handling path.
>
> Fixes: 1fcf0512c9c8 ("virtio_pci: modern driver")
> Signed-off-by: Li Zetao <lizetao1@xxxxxxxxxx>

I would add to this:

------

This makes error handling follow the same order as normal device cleanup
which does:

static void remove_vq_common(struct virtnet_info *vi)
{
virtio_reset_device(vi->vdev);

/* Free unused buffers in both send and recv, if any. */
free_unused_bufs(vi);

free_receive_bufs(vi);

free_receive_page_frags(vi);

virtnet_del_vqs(vi);
}

static void virtnet_remove(struct virtio_device *vdev)
{
struct virtnet_info *vi = vdev->priv;

virtnet_cpu_notif_remove(vi);

/* Make sure no work handler is accessing the device. */
flush_work(&vi->config_work);

unregister_netdev(vi->dev);

net_failover_destroy(vi->failover);

remove_vq_common(vi);

free_netdev(vi->dev);
}


So unregister, destroy failover, then reset - and that flow
is better tested than error handling so we can be reasonably
sure it works well.

-----


I would thus probably also include this tag instead:

Fixes: 0246555550 ("virtio_net: fix use after free on allocation failure")

this is what introduced the difference in cleanup order,
modern driver just added hardware support.


Besides extending the commit log

Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx>


> ---
> drivers/net/virtio_net.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 7106932c6f88..86e52454b5b5 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3949,12 +3949,11 @@ static int virtnet_probe(struct virtio_device *vdev)
> return 0;
>
> free_unregister_netdev:
> - virtio_reset_device(vdev);
> -
> unregister_netdev(dev);
> free_failover:
> net_failover_destroy(vi->failover);
> free_vqs:
> + virtio_reset_device(vdev);
> cancel_delayed_work_sync(&vi->refill);
> free_receive_page_frags(vi);
> virtnet_del_vqs(vi);
> --
> 2.25.1