Re: [PATCH RFC 1/4] vdpa: introduce .reset_map operation callback

From: Eugenio Perez Martin
Date: Thu Aug 17 2023 - 11:30:52 EST


On Thu, Aug 17, 2023 at 2:05 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
>
>
>
> On 8/15/2023 6:55 PM, Jason Wang wrote:
> > On Wed, Aug 16, 2023 at 3:49 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
> >>
> >>
> >> On 8/14/2023 7:21 PM, Jason Wang wrote:
> >>> On Tue, Aug 15, 2023 at 9:46 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
> >>>> Signed-off-by: Si-Wei Liu <si-wei.liu@xxxxxxxxxx>
> >>>> ---
> >>>> include/linux/vdpa.h | 7 +++++++
> >>>> 1 file changed, 7 insertions(+)
> >>>>
> >>>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> >>>> index db1b0ea..3a3878d 100644
> >>>> --- a/include/linux/vdpa.h
> >>>> +++ b/include/linux/vdpa.h
> >>>> @@ -314,6 +314,12 @@ struct vdpa_map_file {
> >>>> * @iova: iova to be unmapped
> >>>> * @size: size of the area
> >>>> * Returns integer: success (0) or error (< 0)
> >>>> + * @reset_map: Reset device memory mapping (optional)
> >>>> + * Needed for device that using device
> >>>> + * specific DMA translation (on-chip IOMMU)
> >>> This exposes the device internal to the upper layer which is not optimal.
> >> Not sure what does it mean by "device internal", but this op callback
> >> just follows existing convention to describe what vdpa parent this API
> >> targets.
> > I meant the bus tries to hide the differences among vendors. So it
> > needs to hide on-chip IOMMU stuff to the upper layer.
> >
> > We can expose two dimensional IO mappings models but it looks like
> > over engineering for this issue. More below.
> >
> >> * @set_map: Set device memory mapping (optional)
> >> * Needed for device that using device
> >> * specific DMA translation (on-chip IOMMU)
> >> :
> >> :
> >> * @dma_map: Map an area of PA to IOVA (optional)
> >> * Needed for device that using device
> >> * specific DMA translation (on-chip IOMMU)
> >> * and preferring incremental map.
> >> :
> >> :
> >> * @dma_unmap: Unmap an area of IOVA (optional but
> >> * must be implemented with dma_map)
> >> * Needed for device that using device
> >> * specific DMA translation (on-chip IOMMU)
> >> * and preferring incremental unmap.
> >>
> >>
> >>> Btw, what's the difference between this and a simple
> >>>
> >>> set_map(NULL)?
> >> I don't think parent drivers support this today - they can accept
> >> non-NULL iotlb containing empty map entry, but not a NULL iotlb. The
> >> behavior is undefined or it even causes panic when a NULL iotlb is
> >> passed in.
> > We can do this simple change if it can work.
> If we go with setting up 1:1 DMA mapping at virtio-vdpa .probe() and
> tearing it down at .release(), perhaps set_map(NULL) is not sufficient.
> >
> >> Further this doesn't work with .dma_map parent drivers.
> > Probably, but I'd remove dma_map as it doesn't have any real users
> > except for the simulator.
> OK, at a point there was suggestion to get this incremental API extended
> to support batching to be in par with or even replace .set_map, not sure
> if it's too soon to conclude. But I'm okay with the removal if need be.

Yes, I think the right move in the long run is to delegate the
batching to the parent driver. This allows drivers like mlx to add
memory (like hotplugged memory) without the need of tearing down all
the old maps.

Having said that, maybe we can work on top if we need to remove
.dma_map for now.

> >
> >> The reason why a new op is needed or better is because it allows
> >> userspace to tell apart different reset behavior from the older kernel
> >> (via the F_IOTLB_PERSIST feature bit in patch 4), while this behavior
> >> could vary between parent drivers.
> > I'm ok with a new feature flag, but we need to first seek a way to
> > reuse the existing API.
> A feature flag is needed anyway. I'm fine with reusing but guess I'd
> want to converge on the direction first.
>
> Thanks,
> -Siwei
> >
> > Thanks
> >
> >> Regards,
> >> -Siwei
> >>
> >>> Thanks
> >>>
> >>>> + * @vdev: vdpa device
> >>>> + * @asid: address space identifier
> >>>> + * Returns integer: success (0) or error (< 0)
> >>>> * @get_vq_dma_dev: Get the dma device for a specific
> >>>> * virtqueue (optional)
> >>>> * @vdev: vdpa device
> >>>> @@ -390,6 +396,7 @@ struct vdpa_config_ops {
> >>>> u64 iova, u64 size, u64 pa, u32 perm, void *opaque);
> >>>> int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,
> >>>> u64 iova, u64 size);
> >>>> + int (*reset_map)(struct vdpa_device *vdev, unsigned int asid);
> >>>> int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
> >>>> unsigned int asid);
> >>>> struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
> >>>> --
> >>>> 1.8.3.1
> >>>>
>