Re: RE: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

From: Yongji Xie
Date: Tue Jun 29 2021 - 04:14:54 EST


On Tue, Jun 29, 2021 at 3:56 PM Liu, Xiaodong <xiaodong.liu@xxxxxxxxx> wrote:
>
>
>
> >-----Original Message-----
> >From: Jason Wang <jasowang@xxxxxxxxxx>
> >Sent: Tuesday, June 29, 2021 12:11 PM
> >To: Liu, Xiaodong <xiaodong.liu@xxxxxxxxx>; Xie Yongji
> ><xieyongji@xxxxxxxxxxxxx>; mst@xxxxxxxxxx; stefanha@xxxxxxxxxx;
> >sgarzare@xxxxxxxxxx; parav@xxxxxxxxxx; hch@xxxxxxxxxxxxx;
> >christian.brauner@xxxxxxxxxxxxx; rdunlap@xxxxxxxxxxxxx; willy@xxxxxxxxxxxxx;
> >viro@xxxxxxxxxxxxxxxxxx; axboe@xxxxxxxxx; bcrl@xxxxxxxxx; corbet@xxxxxxx;
> >mika.penttila@xxxxxxxxxxxx; dan.carpenter@xxxxxxxxxx; joro@xxxxxxxxxx;
> >gregkh@xxxxxxxxxxxxxxxxxxx
> >Cc: songmuchun@xxxxxxxxxxxxx; virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx;
> >netdev@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx;
> >iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> >Subject: Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace
> >
> >
> >在 2021/6/28 下午1:54, Liu, Xiaodong 写道:
> >>> Several issues:
> >>>
> >>> - VDUSE needs to limit the total size of the bounce buffers (64M if I was not
> >>> wrong). Does it work for SPDK?
> >> Yes, Jason. It is enough and works for SPDK.
> >> Since it's a kind of bounce buffer mainly for in-flight IO, so limited size like
> >> 64MB is enough.
> >
> >
> >Ok.
> >
> >
> >>
> >>> - VDUSE can use hugepages but I'm not sure we can mandate hugepages (or
> >we
> >>> need introduce new flags for supporting this)
> >> Same with your worry, I'm afraid too that it is a hard for a kernel module
> >> to directly preallocate hugepage internal.
> >> What I tried is that:
> >> 1. A simple agent daemon (represents for one device) `preallocates` and maps
> >> dozens of 2MB hugepages (like 64MB) for one device.
> >> 2. The daemon passes its mapping addr&len and hugepage fd to kernel
> >> module through created IOCTL.
> >> 3. Kernel module remaps the hugepages inside kernel.
> >
> >
> >Such model should work, but the main "issue" is that it introduce
> >overheads in the case of vhost-vDPA.
> >
> >Note that in the case of vhost-vDPA, we don't use bounce buffer, the
> >userspace pages were shared directly.
> >
> >And since DMA is not done per page, it prevents us from using tricks
> >like vm_insert_page() in those cases.
> >
>
> Yes, really, it's a problem to handle vhost-vDPA case.
> But there are already several solutions to get VM served, like vhost-user,
> vfio-user, so at least for SPDK, it won't serve VM through VDUSE. If a user
> still want to do that, then the user should tolerate Introduced overhead.
>
> In other words, software backend like SPDK, will appreciate the virtio
> datapath of VDUSE to serve local host instead of VM. That's why I also drafted
> a "virtio-local" to bridge vhost-user target and local host kernel virtio-blk.
>
> >
> >> 4. Vhost user target gets and maps hugepage fd from kernel module
> >> in vhost-user msg through Unix Domain Socket cmsg.
> >> Then kernel module and target map on the same hugepage based
> >> bounce buffer for in-flight IO.
> >>
> >> If there is one option in VDUSE to map userspace preallocated memory, then
> >> VDUSE should be able to mandate it even it is hugepage based.
> >>
> >
> >As above, this requires some kind of re-design since VDUSE depends on
> >the model of mmap(MAP_SHARED) instead of umem registering.
>
> Got it, Jason, this may be hard for current version of VDUSE.
> Maybe we can consider these options after VDUSE merged later.
>
> Since if VDUSE datapath could be directly leveraged by vhost-user target,
> its value will be propagated immediately.
>

Agreed!

Thanks,
Yongji