Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

From: Jason Wang
Date: Mon Oct 16 2023 - 22:36:36 EST


On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
>
>
>
> On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote:
> > On Mon, Oct 16, 2023 at 8:33 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >> On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
> >>>
> >>>
> >>> On 10/12/2023 8:01 PM, Jason Wang wrote:
> >>>> On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
> >>>>> Devices with on-chip IOMMU or vendor specific IOTLB implementation
> >>>>> may need to restore iotlb mapping to the initial or default state
> >>>>> using the .reset_map op, as it's desirable for some parent devices
> >>>>> to solely manipulate mappings by its own, independent of virtio device
> >>>>> state. For instance, device reset does not cause mapping go away on
> >>>>> such IOTLB model in need of persistent mapping. Before vhost-vdpa
> >>>>> is going away, give them a chance to reset iotlb back to the initial
> >>>>> state in vhost_vdpa_cleanup().
> >>>>>
> >>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@xxxxxxxxxx>
> >>>>> ---
> >>>>> drivers/vhost/vdpa.c | 16 ++++++++++++++++
> >>>>> 1 file changed, 16 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >>>>> index 851535f..a3f8160 100644
> >>>>> --- a/drivers/vhost/vdpa.c
> >>>>> +++ b/drivers/vhost/vdpa.c
> >>>>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
> >>>>> return vhost_vdpa_alloc_as(v, asid);
> >>>>> }
> >>>>>
> >>>>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
> >>>>> +{
> >>>>> + struct vdpa_device *vdpa = v->vdpa;
> >>>>> + const struct vdpa_config_ops *ops = vdpa->config;
> >>>>> +
> >>>>> + if (ops->reset_map)
> >>>>> + ops->reset_map(vdpa, asid);
> >>>>> +}
> >>>>> +
> >>>>> static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> >>>>> {
> >>>>> struct vhost_vdpa_as *as = asid_to_as(v, asid);
> >>>>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> >>>>>
> >>>>> hlist_del(&as->hash_link);
> >>>>> vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
> >>>>> + /*
> >>>>> + * Devices with vendor specific IOMMU may need to restore
> >>>>> + * iotlb to the initial or default state which is not done
> >>>>> + * through device reset, as the IOTLB mapping manipulation
> >>>>> + * could be decoupled from the virtio device life cycle.
> >>>>> + */
> >>>> Should we do this according to whether IOTLB_PRESIST is set?
> >>> Well, in theory this seems like so but it's unnecessary code change
> >>> actually, as that is the way how vDPA parent behind platform IOMMU works
> >>> today, and userspace doesn't break as of today. :)
> >> Well, this is one question I've ever asked before. You have explained
> >> that one of the reason that we don't break userspace is that they may
> >> couple IOTLB reset with vDPA reset as well. One example is the Qemu.
> >>
> >>> As explained in previous threads [1][2], when IOTLB_PERSIST is not set
> >>> it doesn't necessarily mean the iotlb will definitely be destroyed
> >>> across reset (think about the platform IOMMU case), so userspace today
> >>> is already tolerating enough with either good or bad IOMMU.

I'm confused, how to define tolerating here? For example, if it has
tolerance, why bother?

> >>This code of
> >>> not checking IOTLB_PERSIST being set is intentional, there's no point to
> >>> emulate bad IOMMU behavior even for older userspace (with improper
> >>> emulation to be done it would result in even worse performance).

I can easily imagine a case:

The old Qemu that works only with a setup like mlx5_vdpa. If we do
this without a negotiation, IOTLB will not be clear but the Qemu will
try to re-program the IOTLB after reset. Which will break?

1) stick the exact old behaviour with just one line of check
2) audit all the possible cases to avoid a one line of code

1) seems much easier than 2)

> >> For two reasons:
> >>
> >> 1) backend features need acked by userspace this is by design
> >> 2) keep the odd behaviour seems to be more safe as we can't audit
> >> every userspace program
> >>
> > The old behavior (without flag ack) cannot be trusted already, as:

Possibly but the point is to unbreak userspace no matter how weird the
behaviour we've ever had.

> > * Devices using platform IOMMU (in other words, not implementing
> > neither .set_map nor .dma_map) does not unmap memory at virtio reset.
> > * Devices that implement .set_map or .dma_map (vdpa_sim, mlx5) do
> > reset IOTLB, but in their parent ops (vdpasim_do_reset, prune_iotlb
> > called from mlx5_vdpa_reset). With vdpa_sim patch removing the reset,
> > now all backends work the same as far as I know., which was (and is)
> > the way devices using the platform IOMMU works.
> >
> > The difference in behavior did not matter as QEMU unmaps all the
> > memory unregistering the memory listener at vhost_vdpa_dev_start(...,
> > started = false),
> Exactly. It's not just QEMU, but any (older) userspace manipulates
> mappings through the vhost-vdpa iotlb interface has to unmap all
> mappings to workaround the vdpa parent driver bug.

Just to clarify, from userspace, it's the (odd) behaviour of the current uAPI.

> If they don't do
> explicit unmap, it would cause state inconsistency between vhost-vdpa
> and parent driver, then old mappings can't be restored, and new mapping
> can be added to iotlb after vDPA reset. There's no point to preserve
> this broken and inconsistent behavior between vhost-vdpa and parent
> driver, as userspace doesn't care at all!

It's a userspace notice change so we can't fix it silently:

https://lkml.org/lkml/2012/12/23/75

Another example which is related to vhost-vDPA:

https://lore.kernel.org/netdev/20230927140544.205088-1-eric.auger@xxxxxxxxxx/T/

Thanks

>
> > but the backend acknowledging this feature flag
> > allows QEMU to make sure it is safe to skip this unmap & map in the
> > case of vhost stop & start cycle.
> >
> > In that sense, this feature flag is actually a signal for userspace to
> > know that the bug has been solved.
> Right, I couldn't say it better than you do, thanks! The feature flag is
> more of an unusual means to indicating kernel bug having been fixed,
> rather than introduce a new feature or new kernel behavior ending up in
> change of userspace's expectation.
>
> > Not offering it indicates that
> > userspace cannot trust the kernel will retain the maps.
> >
> > Si-Wei or Dragos, please correct me if I've missed something. Feel
> > free to use the text in case you find more clear in doc or patch log.
> Sure, will do, thank you! Will post v2 adding these to the log.
>
> Thanks,
> -Siwei
>
>
>
> >
> > Thanks!
> >
> >> Thanks
> >>
> >>> I think
> >>> the purpose of the IOTLB_PERSIST flag is just to give userspace 100%
> >>> certainty of persistent iotlb mapping not getting lost across vdpa reset.
> >>>
> >>> Thanks,
> >>> -Siwei
> >>>
> >>> [1]
> >>> https://lore.kernel.org/virtualization/9f118fc9-4f6f-dd67-a291-be78152e47fd@xxxxxxxxxx/
> >>> [2]
> >>> https://lore.kernel.org/virtualization/3364adfd-1eb7-8bce-41f9-bfe5473f1f2e@xxxxxxxxxx/
> >>>> Otherwise
> >>>> we may break old userspace.
> >>>>
> >>>> Thanks
> >>>>
> >>>>> + vhost_vdpa_reset_map(v, asid);
> >>>>> kfree(as);
> >>>>>
> >>>>> return 0;
> >>>>> --
> >>>>> 1.8.3.1
> >>>>>
>