Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

From: Eugenio Perez Martin
Date: Fri Dec 23 2022 - 03:06:47 EST


On Fri, Dec 23, 2022 at 4:04 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
>
> On Thu, Dec 22, 2022 at 5:19 PM Eugenio Perez Martin
> <eperezma@xxxxxxxxxx> wrote:
> >
> > On Thu, Dec 22, 2022 at 7:05 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> > >
> > > We used to busy waiting on the cvq command this tends to be
> > > problematic since:
> > >
> > > 1) CPU could wait for ever on a buggy/malicous device
> > > 2) There's no wait to terminate the process that triggers the cvq
> > > command
> > >
> > > So this patch switch to use sleep with a timeout (1s) instead of busy
> > > polling for the cvq command forever. This gives the scheduler a breath
> > > and can let the process can respond to a signal.
> > >
> > > Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx>
> > > ---
> > > drivers/net/virtio_net.c | 15 ++++++++-------
> > > 1 file changed, 8 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 8225496ccb1e..69173049371f 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -405,6 +405,7 @@ static void disable_rx_mode_work(struct virtnet_info *vi)
> > > vi->rx_mode_work_enabled = false;
> > > spin_unlock_bh(&vi->rx_mode_lock);
> > >
> > > + virtqueue_wake_up(vi->cvq);
> > > flush_work(&vi->rx_mode_work);
> > > }
> > >
> > > @@ -1497,6 +1498,11 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
> > > return !oom;
> > > }
> > >
> > > +static void virtnet_cvq_done(struct virtqueue *cvq)
> > > +{
> > > + virtqueue_wake_up(cvq);
> > > +}
> > > +
> > > static void skb_recv_done(struct virtqueue *rvq)
> > > {
> > > struct virtnet_info *vi = rvq->vdev->priv;
> > > @@ -2024,12 +2030,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
> > > if (unlikely(!virtqueue_kick(vi->cvq)))
> > > return vi->ctrl->status == VIRTIO_NET_OK;
> > >
> > > - /* Spin for a response, the kick causes an ioport write, trapping
> > > - * into the hypervisor, so the request should be handled immediately.
> > > - */
> > > - while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > - !virtqueue_is_broken(vi->cvq))
> > > - cpu_relax();
> > > + virtqueue_wait_for_used(vi->cvq, &tmp);
> > >
> > > return vi->ctrl->status == VIRTIO_NET_OK;
> > > }
> > > @@ -3524,7 +3525,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > >
> > > /* Parameters for control virtqueue, if any */
> > > if (vi->has_cvq) {
> > > - callbacks[total_vqs - 1] = NULL;
> > > + callbacks[total_vqs - 1] = virtnet_cvq_done;
> >
> > If we're using CVQ callback, what is the actual use of the timeout?
>
> Because we can't sleep forever since locks could be held like RTNL_LOCK.
>

Right, rtnl_lock kind of invalidates it for a general case.

But do all of the commands need to take rtnl_lock? For example I see
how we could remove it from ctrl_announce, so lack of ack may not be
fatal for it. Assuming a buggy device, we can take some cvq commands
out of this fatal situation.

This series already improves the current situation and my suggestion
(if it's worth it) can be applied on top of it, so it is not a blocker
at all.

> >
> > I'd say there is no right choice neither in the right timeout value
> > nor in the action to take.
>
> In the next version, I tend to put BAD_RING() to prevent future requests.
>
> > Why not simply trigger the cmd and do all
> > the changes at command return?
>
> I don't get this, sorry.
>

It's actually expanding the first point so you already answered it :).

Thanks!

> >
> > I suspect the reason is that it complicates the code. For example,
> > having the possibility of many in flight commands, races between their
> > completion, etc.
>
> Actually the cvq command was serialized through RTNL_LOCK, so we don't
> need to worry about this.
>
> In the next version I can add ASSERT_RTNL().
>
> Thanks
>
> > The virtio standard does not even cover unordered
> > used commands if I'm not wrong.
> >
> > Is there any other fundamental reason?
> >
> > Thanks!
> >
> > > names[total_vqs - 1] = "control";
> > > }
> > >
> > > --
> > > 2.25.1
> > >
> >
>