Re: [PATCH] virtio_net: suppress cpu stall when free_unused_bufs

From: Xuan Zhuo
Date: Thu Apr 27 2023 - 04:57:29 EST


On Thu, 27 Apr 2023 16:49:58 +0800, Wenliang Wang <wangwenliang.1995@xxxxxxxxxxxxx> wrote:
> On 4/27/23 4:23 PM, Michael S. Tsirkin wrote:
> > On Thu, Apr 27, 2023 at 04:13:45PM +0800, Xuan Zhuo wrote:
> >> On Thu, 27 Apr 2023 04:12:44 -0400, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> >>> On Thu, Apr 27, 2023 at 03:13:44PM +0800, Xuan Zhuo wrote:
> >>>> On Thu, 27 Apr 2023 15:02:26 +0800, Wenliang Wang <wangwenliang.1995@xxxxxxxxxxxxx> wrote:
> >>>>>
> >>>>>
> >>>>> On 4/27/23 2:20 PM, Xuan Zhuo wrote:
> >>>>>> On Thu, 27 Apr 2023 12:34:33 +0800, Wenliang Wang <wangwenliang.1995@xxxxxxxxxxxxx> wrote:
> >>>>>>> For multi-queue and large rx-ring-size use case, the following error
> >>>>>>
> >>>>>> Cound you give we one number for example?
> >>>>>
> >>>>> 128 queues and 16K queue_size is typical.
> >>>>>
> >>>>>>
> >>>>>>> occurred when free_unused_bufs:
> >>>>>>> rcu: INFO: rcu_sched self-detected stall on CPU.
> >>>>>>>
> >>>>>>> Signed-off-by: Wenliang Wang <wangwenliang.1995@xxxxxxxxxxxxx>
> >>>>>>> ---
> >>>>>>> drivers/net/virtio_net.c | 1 +
> >>>>>>> 1 file changed, 1 insertion(+)
> >>>>>>>
> >>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>>>>>> index ea1bd4bb326d..21d8382fd2c7 100644
> >>>>>>> --- a/drivers/net/virtio_net.c
> >>>>>>> +++ b/drivers/net/virtio_net.c
> >>>>>>> @@ -3565,6 +3565,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >>>>>>> struct virtqueue *vq = vi->rq[i].vq;
> >>>>>>> while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> >>>>>>> virtnet_rq_free_unused_buf(vq, buf);
> >>>>>>> + schedule();
> >>>>>>
> >>>>>> Just for rq?
> >>>>>>
> >>>>>> Do we need to do the same thing for sq?
> >>>>> Rq buffers are pre-allocated, take seconds to free rq unused buffers.
> >>>>>
> >>>>> Sq unused buffers are much less, so do the same for sq is optional.
> >>>>
> >>>> I got.
> >>>>
> >>>> I think we should look for a way, compatible with the less queues or the smaller
> >>>> rings. Calling schedule() directly may be not a good way.
> >>>>
> >>>> Thanks.
> >>>
> >>> Why isn't it a good way?
> >>
> >> For the small ring, I don't think it is a good way, maybe we only deal with one
> >> buf, then call schedule().
> >>
> >> We can call the schedule() after processing a certain number of buffers,
> >> or check need_resched () first.
> >>
> >> Thanks.
> >
> >
> > Wenliang, does
> > if (need_resched())
> > schedule();
> > fix the issue for you?
> >
> Yeah, it works better.

I prefer to use it in combination with a fixed number(such as 256).
Every time 256 buffers are processed, check need_resched().
This can accommodate large rings and small rings.

Also, it is necessary to add similar logic to sq. Although the possibility is
low, it is possible that the same problem will occur.

Thanks.

> >
> >>
> >>
> >>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>>
> >>>>>>> }
> >>>>>>> }
> >>>>>>>
> >>>>>>> --
> >>>>>>> 2.20.1
> >>>>>>>
> >>>
> >