Re: [PATCH] virtio_net: suppress cpu stall when free_unused_bufs

From: Michael S. Tsirkin
Date: Thu Apr 27 2023 - 04:25:07 EST


On Thu, Apr 27, 2023 at 04:13:45PM +0800, Xuan Zhuo wrote:
> On Thu, 27 Apr 2023 04:12:44 -0400, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > On Thu, Apr 27, 2023 at 03:13:44PM +0800, Xuan Zhuo wrote:
> > > On Thu, 27 Apr 2023 15:02:26 +0800, Wenliang Wang <wangwenliang.1995@xxxxxxxxxxxxx> wrote:
> > > >
> > > >
> > > > On 4/27/23 2:20 PM, Xuan Zhuo wrote:
> > > > > On Thu, 27 Apr 2023 12:34:33 +0800, Wenliang Wang <wangwenliang.1995@xxxxxxxxxxxxx> wrote:
> > > > >> For multi-queue and large rx-ring-size use case, the following error
> > > > >
> > > > > Cound you give we one number for example?
> > > >
> > > > 128 queues and 16K queue_size is typical.
> > > >
> > > > >
> > > > >> occurred when free_unused_bufs:
> > > > >> rcu: INFO: rcu_sched self-detected stall on CPU.
> > > > >>
> > > > >> Signed-off-by: Wenliang Wang <wangwenliang.1995@xxxxxxxxxxxxx>
> > > > >> ---
> > > > >> drivers/net/virtio_net.c | 1 +
> > > > >> 1 file changed, 1 insertion(+)
> > > > >>
> > > > >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > >> index ea1bd4bb326d..21d8382fd2c7 100644
> > > > >> --- a/drivers/net/virtio_net.c
> > > > >> +++ b/drivers/net/virtio_net.c
> > > > >> @@ -3565,6 +3565,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > >> struct virtqueue *vq = vi->rq[i].vq;
> > > > >> while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > >> virtnet_rq_free_unused_buf(vq, buf);
> > > > >> + schedule();
> > > > >
> > > > > Just for rq?
> > > > >
> > > > > Do we need to do the same thing for sq?
> > > > Rq buffers are pre-allocated, take seconds to free rq unused buffers.
> > > >
> > > > Sq unused buffers are much less, so do the same for sq is optional.
> > >
> > > I got.
> > >
> > > I think we should look for a way, compatible with the less queues or the smaller
> > > rings. Calling schedule() directly may be not a good way.
> > >
> > > Thanks.
> >
> > Why isn't it a good way?
>
> For the small ring, I don't think it is a good way, maybe we only deal with one
> buf, then call schedule().
>
> We can call the schedule() after processing a certain number of buffers,
> or check need_resched () first.
>
> Thanks.


Wenliang, does
if (need_resched())
schedule();
fix the issue for you?


>
>
> >
> > >
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > >> }
> > > > >> }
> > > > >>
> > > > >> --
> > > > >> 2.20.1
> > > > >>
> >