Re: [RFC 0/2] vduse: add support for networking devices

From: Jason Wang
Date: Fri Apr 21 2023 - 01:52:48 EST


On Thu, Apr 20, 2023 at 10:16 PM Maxime Coquelin
<maxime.coquelin@xxxxxxxxxx> wrote:
>
>
>
> On 4/20/23 06:34, Jason Wang wrote:
> > On Wed, Apr 19, 2023 at 9:43 PM Maxime Coquelin
> > <maxime.coquelin@xxxxxxxxxx> wrote:
> >>
> >> This small series enables virtio-net device type in VDUSE.
> >> With it, basic operation have been tested, both with
> >> virtio-vdpa and vhost-vdpa using DPDK Vhost library series
> >> adding VDUSE support [0] using split rings layout.
> >>
> >> Control queue support (and so multiqueue) has also been
> >> tested, but require a Kernel series from Jason Wang
> >> relaxing control queue polling [1] to function reliably.
> >>
> >> Other than that, we have identified a few gaps:
> >>
> >> 1. Reconnection:
> >> a. VDUSE_VQ_GET_INFO ioctl() returns always 0 for avail
> >> index, even after the virtqueue has already been
> >> processed. Is that expected? I have tried instead to
> >> get the driver's avail index directly from the avail
> >> ring, but it does not seem reliable as I sometimes get
> >> "id %u is not a head!\n" warnings. Also such solution
> >> would not be possible with packed ring, as we need to
> >> know the wrap counters values.
> >
> > Looking at the codes, it only returns the value that is set via
> > set_vq_state(). I think it is expected to be called before the
> > datapath runs.
> >
> > So when bound to virtio-vdpa, it is expected to return 0. But we need
> > to fix the packed virtqueue case, I wonder if we need to call
> > set_vq_state() explicitly in virtio-vdpa before starting the device.
> >
> > When bound to vhost-vdpa, Qemu will call VHOST_SET_VRING_BASE which
> > will end up a call to set_vq_state(). Unfortunately, it doesn't
> > support packed ring which needs some extension.
> >
> >>
> >> b. Missing IOCTLs: it would be handy to have new IOCTLs to
> >> query Virtio device status,
> >
> > What's the use case of this ioctl? It looks to me userspace is
> > notified on each status change now:
> >
> > static int vduse_dev_set_status(struct vduse_dev *dev, u8 status)
> > {
> > struct vduse_dev_msg msg = { 0 };
> >
> > msg.req.type = VDUSE_SET_STATUS;
> > msg.req.s.status = status;
> >
> > return vduse_dev_msg_sync(dev, &msg);
> > }
>
> The idea was to be able to query the status at reconnect time, and
> neither having to assume its value nor having to store its value in a
> file (the status could change while the VDUSE application is stopped,
> but maybe it would receive the notification at reconnect).

I see.

>
> I will prototype using a tmpfs file to save needed information, and see
> if it works.

It might work but then the API is not self contained. Maybe it's
better to have a dedicated ioctl.

>
> >> and retrieve the config
> >> space set at VDUSE_CREATE_DEV time.
> >
> > In order to be safe, VDUSE avoids writable config space. Otherwise
> > drivers could block on config writing forever. That's why we don't do
> > it now.
>
> The idea was not to make the config space writable, but just to be able
> to fetch what was filled at VDUSE_CREATE_DEV time.
>
> With the tmpfs file, we can avoid doing that and just save the config
> space there.

Same as the case for status.

Thanks

>
> > We need to harden the config write before we can proceed to this I think.
> >
> >>
> >> 2. VDUSE application as non-root:
> >> We need to run the VDUSE application as non-root. There
> >> is some race between the time the UDEV rule is applied
> >> and the time the device starts being used. Discussing
> >> with Jason, he suggested we may have a VDUSE daemon run
> >> as root that would create the VDUSE device, manages its
> >> rights and then pass its file descriptor to the VDUSE
> >> app. However, with current IOCTLs, it means the VDUSE
> >> daemon would need to know several information that
> >> belongs to the VDUSE app implementing the device such
> >> as supported Virtio features, config space, etc...
> >> If we go that route, maybe we should have a control
> >> IOCTL to create the device which would just pass the
> >> device type. Then another device IOCTL to perform the
> >> initialization. Would that make sense?
> >
> > I think so. We can hear from others.
> >
> >>
> >> 3. Coredump:
> >> In order to be able to perform post-mortem analysis, DPDK
> >> Vhost library marks pages used for vrings and descriptors
> >> buffers as MADV_DODUMP using madvise(). However with
> >> VDUSE it fails with -EINVAL. My understanding is that we
> >> set VM_DONTEXPAND flag to the VMAs and madvise's
> >> MADV_DODUMP fails if it is present. I'm not sure to
> >> understand why madvise would prevent MADV_DODUMP if
> >> VM_DONTEXPAND is set. Any thoughts?
> >
> > Adding Peter who may know the answer.
>
> Thanks!
> Maxime
>
> > Thanks
> >
> >>
> >> [0]: https://patchwork.dpdk.org/project/dpdk/list/?series=27594&state=%2A&archive=both
> >> [1]: https://lore.kernel.org/lkml/CACGkMEtgrxN3PPwsDo4oOsnsSLJfEmBEZ0WvjGRr3whU+QasUg@xxxxxxxxxxxxxx/T/
> >>
> >> Maxime Coquelin (2):
> >> vduse: validate block features only with block devices
> >> vduse: enable Virtio-net device type
> >>
> >> drivers/vdpa/vdpa_user/vduse_dev.c | 11 +++++++----
> >> 1 file changed, 7 insertions(+), 4 deletions(-)
> >>
> >> --
> >> 2.39.2
> >>
> >
>