Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

From: Jason Wang
Date: Fri Oct 20 2023 - 00:12:38 EST


On Fri, Oct 20, 2023 at 6:28 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
>
>
>
> On 10/19/2023 7:39 AM, Eugenio Perez Martin wrote:
> > On Thu, Oct 19, 2023 at 10:27 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >> On Thu, Oct 19, 2023 at 2:47 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
> >>>
> >>>
> >>> On 10/18/2023 7:53 PM, Jason Wang wrote:
> >>>> On Wed, Oct 18, 2023 at 4:49 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
> >>>>>
> >>>>> On 10/18/2023 12:00 AM, Jason Wang wrote:
> >>>>>>> Unfortunately, it's a must to stick to ABI. I agree it's a mess but we
> >>>>>>> don't have a better choice. Or we can fail the probe if userspace
> >>>>>>> doesn't ack this feature.
> >>>>>> Antoher idea we can just do the following in vhost_vdpa reset?
> >>>>>>
> >>>>>> config->reset()
> >>>>>> if (IOTLB_PERSIST is not set) {
> >>>>>> config->reset_map()
> >>>>>> }
> >>>>>>
> >>>>>> Then we don't have the burden to maintain them in the parent?
> >>>>>>
> >>>>>> Thanks
> >>>>> Please see my earlier response in the other email, thanks.
> >>>>>
> >>>>> ----------------%<----------------%<----------------
> >>>>>
> >>>>> First, the ideal fix would be to leave this reset_vendor_mappings()
> >>>>> emulation code on the individual driver itself, which already has the
> >>>>> broken behavior.
> >>>> So the point is, not about whether the existing behavior is "broken"
> >>>> or not.
> >>> Hold on, I thought earlier we all agreed upon that the existing behavior
> >>> of vendor driver self-clearing maps during .reset violates the vhost
> >>> iotlb abstraction and also breaks the .set_map/.dma_map API. This is
> >>> 100% buggy driver implementation itself that we should discourage or
> >>> eliminate as much as possible (that's part of the goal for this series),
> >> I'm not saying it's not an issue, what I'm saying is, if the fix
> >> breaks another userspace, it's a new bug in the kernel. See what Linus
> >> said in [1]
> >>
> >> "If a change results in user programs breaking, it's a bug in the kernel."
> >>
> >>> but here you seem to go existentialism and suggests the very opposite
> >>> that every .set_map/.dma_map driver implementation, regardless being the
> >>> current or the new/upcoming, should unconditionally try to emulate the
> >>> broken reset behavior for the sake of not breaking older userspace.
> >> Such "emulation" is not done at the parent level. New parents just
> >> need to implement reset_map() or not. everything could be done inside
> >> vhost-vDPA as pseudo code that is shown above.
> >>
> >>> Set
> >>> aside the criteria and definition for how userspace can be broken, can
> >>> we step back to the original question why we think it's broken, and what
> >>> we can do to promote good driver implementation instead of discuss the
> >>> implementation details?
> >> I'm not sure I get the point of this question. I'm not saying we don't
> >> need to fix, what I am saying is that such a fix must be done in a
> >> negotiable way. And it's better if parents won't get any burden. It
> >> can just decide to implement reset_map() or not.
> >>
> >>> Reading the below response I found my major
> >>> points are not heard even if written for quite a few times.
> >> I try my best to not ignore any important things, but I can't promise
> >> I will not miss any. I hope the above clarifies my points.
> >>
> >>> It's not
> >>> that I don't understand the importance of not breaking old userspace, I
> >>> appreciate your questions and extra patience, however I do feel the
> >>> "broken" part is very relevant to our discussion here.
> >>> If it's broken (in the sense of vhost IOTLB API) that you agree, I think
> >>> we should at least allow good driver implementations; and when you think
> >>> about the possibility of those valid good driver cases
> >>> (.set_map/.dma_map implementations that do not clear maps in .reset),
> >>> you might be able to see why it's coded the way as it is now.
> >>>
> >>>> It's about whether we could stick to the old behaviour without
> >>>> too much cost. And I believe we could.
> >>>>
> >>>> And just to clarify here, reset_vendor_mappings() = config->reset_map()
> >>>>
> >>>>> But today there's no backend feature negotiation
> >>>>> between vhost-vdpa and the parent driver. Do we want to send down the
> >>>>> acked_backend_features to parent drivers?
> >>>> There's no need to do that with the above code, or anything I missed here?
> >>>>
> >>>> config->reset()
> >>>> if (IOTLB_PERSIST is not set) {
> >>>> config->reset_map()
> >>>> }
> >>> Implementation issue: this implies reset_map() has to be there for every
> >>> .set_map implementations, but vendor driver implementation for custom
> >>> IOMMU could well implement DMA ops by itself instead of .reset_map. This
> >>> won't work for every set_map driver (think about the vduse case).
> >> Well let me do it once again, reset_map() is not mandated:
> >>
> >> config->reset()
> >> if (IOTLB_PERSIST is not set) {
> >> if (config->reset_map)
> >> config->reset_map()
> > To avoid new parent drivers
> I am afraid it's not just new parent drivers, but any well behaved
> driver today may well break userspace if go with this forced emulation
> code, if they have to implement reset_map for some reason (e.g. restored
> to 1:1 passthrough mapping or other default state in mapping). For new
> userspace and user driver we can guard against it using the
> IOTLB_PERSIST flag, but the above code would get a big chance to break
> setup with good driver and older userspace in practice.
>
> And .reset_map implementation doesn't necessarily need to clear maps.
> For e.g. IOMMU API compliant driver that only needs simple DMA model for
> passthrough, all .reset_map has to do is toggle to 1:1 mapping mode to
> the default/initial state without taking care of maps, as
> vhost_vdpa_unmap(0, -1ULL) earlier should have done the map cleaning job
> already.

Ok, finally, it takes me a while to understand the issue :)

Actually, there are two things:

1) stick the IOTLB mappings across the reset
2) reset the vendor specific mappings to whatever the parent think
it's comfort (like 1:1)

So I think my suggestion doesn't work.

>
>
> > to have this behavior if they need to
> > implement reset_map,
> >
> > What if we add a new callback like "config->buggy_virtio_reset_map",
> > different from regular reset_map callback at cleanup?
> Right, separating out the need for old behavior emulation from
> .reset_map is much cleaner, and this is what individual broken driver
> has to maintain without penalizing other good drivers. Good to see what
> I said earlier is heard.
>
> > Only mlx5 and
> > vdpa_sim need to implement it, with a big warning, and new parent
> > drivers can trust they'll never have the old bad behavior.
> Let's see what Jason will say about it and try to converge on this
> first, I think he seemed to imply that this is part of ABI that every
> driver has to make compromise for. I'd better get it ack'ed before
> proceeding to the rest.

Thanks for your patience.

I think we have some choices:

(All of the below can work, but we need to choose the best)

1) module parameter: this turns out to be hard for the management as
it requires the subtle knowledge of a specific user space which turns
out to be hard
2) buggy_virtio_reset_map: seems like somehow a pollution of the
config ops, I think we can do this only if we have other choice
3) set_backend_features: I understand the concern that we should not
propagate the vhost level feature to parent, the reason is most of
them are irrelevant to the parent. I think the right way is to
introduce get_parent_features()/set_parent_features() then we can
choose to map some parent feature to vhost like
(ENALBE_AFTER_DRIVER_OK and IOTLB_PERSIST)
4) piggy backing whether we need to clean vendor specific IOTLB in
config->reset(bool clean_map)

Siwei, Eugenio, what's your opinion here?

Thanks

>
> Thanks,
> -Siwei
>
> >
> >> }
> >>
> >> Did you see any issue with VDUSE in this case?
> >>
> >>> But this is not the the point I was making. I think if you agree this is
> >>> purely buggy driver implementation of its own, we should try to isolate
> >>> this buggy behavior to individual driver rather than overload vhost-vdpa
> >>> or vdpa core's role to help implement the emulation of broken driver
> >>> behavior.
> >> As I pointed out, if it is not noticeable in the userspace, that's
> >> fine but it's not.
> >>
> >>> I don't get why .reset is special here, the abuse of .reset to
> >>> manipulate mapping could also happen in other IOMMU unrelated driver
> >>> entries like in .suspend, or in queue_reset.
> >> Who can abuse reset here? It is totally under the control of
> >> vhost-vDPA and it's not visible to uAPI. And we can fully control the
> >> behaviour of vhost-vDPA.
> >>
> >>> If someday userspace is
> >>> found coded around similar buggy driver implementation in other driver
> >>> ops, do we want to follow and duplicate the same emulation in vdpa core
> >>> as the precedent is already set here around .reset?
> >> I think so, have you seen the links I give you? If you want to go
> >> through the one from Linus thread[1], you can see the one that unbreak
> >> virtio-IOMMU[2]:
> >>
> >> 1) Someday, we spot invalidate with size 0 is a bug
> >> 2) We fix this bug by not allowing this
> >> 3) But virtio-IOMMU userspace find that size 0 actually clean all the
> >> IOTLB so it depends on the behaviour
> >> 4) So the virtio-IOMMU userspace find it can't work after 2)
> >> 5) Then we recover the behaviour before 2) via [2]
> >>
> >> Another example is the IOTLB_MSG_V2, V1 suffers from in-stable ABI in
> >> 32bit archs, most of the userspace survives since it never runs on
> >> 32bit archs. The fix is to introduce a V2 but we will stick to V1 by
> >> default if V2 is not acknowledged by the userspace.
> >>
> >> I think the above 2 examples are sufficient for us to understand the
> >> case. If not, I can help to clarify more since I'm involved in those 2
> >> fixes.
> >>
> >>> The buggy driver can fail in a lot of other ways indefinitely during
> >>> reset, if there's a buggy driver that's already broken the way as how it
> >>> is and happens to survive with all userspace apps, we just don't care
> >>> and let it be.
> >> Without IOTLB_PRESIST it doesn't break. With IOTLB_PERSIST and if the
> >> reset_map() is done unconditionally, it can break. That's my point.
> >>
> >>> There's no way we can enumerate all those buggy behaviors
> >>> in .reset_map itself, it's overloading that driver API too much.
> >> If it is not noticeable by userspace, we can do any fix at will. But
> >> it is not, we don't have another choice. Especially considering the
> >> cost is rather low.
> >>
> >>>>> Second, IOTLB_PERSIST is needed but not sufficient. Due to lack of
> >>>>> backend feature negotiation in parent driver, if vhost-vdpa has to
> >>>>> provide the old-behaviour emulation for compatibility on driver's
> >>>>> behalf, it needs to be done per-driver basis. There could be good
> >>>>> on-chip or vendor IOMMU implementation which doesn't clear the IOTLB in
> >>>>> .reset, and vendor specific IOMMU doesn't have to provide .reset_map,
> >>>> Then we just don't offer IOTLB_PRESIST, isn't this by design?
> >>> Think about the vduse case, it can work with DMA ops directly so doesn't
> >>> have to implement .reset_map, unless for some specific good reason.
> >>> Because it's a conforming and valid/good driver implementation, we may
> >>> still allow it to advertise IOTLB_PERSIST to userspace.
> >> I would like to know why this can't work in this case:
> >>
> >> config->reset()
> >> if (IOTLB_PERSIST is not set) {
> >> if (config->reset_map)
> >> config->reset_map()
> >> }
> >>
> >>> Which belongs to
> >>> the 3rd bullet below:
> >>>
> >>> https://lore.kernel.org/virtualization/1696928580-7520-4-git-send-email-si-wei.liu@xxxxxxxxxx/
> >>>
> >>> There are 3 cases that backend may claim this feature bit on:
> >>>
> >>> - parent device that has to work with platform IOMMU
> >>> - parent device with on-chip IOMMU that has the expected
> >>> .reset_map support in driver
> >>> - parent device with vendor specific IOMMU implementation
> >>> that explicitly declares the specific backend feature
> >>>
> >>>>> we
> >>>>> should allow these good driver implementations rather than
> >>>>> unconditionally stick to some specific problematic behavior for every
> >>>>> other good driver.
> >>>> Then you can force reset_map() with set_map() that is what I suggest
> >>>> in another thread, no?
> >>> This is exactly what I was afraid of that broken behavior emulation may
> >>> become a dangerous slippery slope - in principle we should encourage
> >>> good driver implementation, as they can work totally fine with older
> >>> userspace. Why do they have to bother emulating broken behavior just
> >>> because some other driver's misbehaving?
> >> Please read the link [1], Linus has explained it.
> >>
> >>> And what's the boundary for
> >>> this hack, do drivers backed by platform IOMMU even have to emulate (if
> >>> not why not, and is there substantial difference in between)?
> >> The boundary is whether the behaviour change could be noticed but
> >> userspace. And I've shown you it's not a burden with the pseudo codes.
> >> If not, please explain why.
> >>
> >>> After
> >>> getting through all of this, do you still believe everything is just as
> >>> easy and simple as what thought to be?
> >> The truth is that bugs exist everywhere. We can't promise there's no
> >> bug when developing an uAPI or subsystem. For kernel code, the bug
> >> that touches uAPI might be fixed in a way that doesn't break existing
> >> userspace. If you look at how downstream to maintain kABI, you will be
> >> supersized furtherly.
> >>
> >>> Btw, I thought I was expecting but still haven't got the clear answers
> >>> to what was the goal to do all this, we spent a lot of time trying to
> >>> unbreak userspace,
> >> The code is pretty simple. But yes, the time spent on justifying it
> >> might take some time. That's the community. People need time to
> >> understand each other's points.
> >>
> >>> but looks to me as if we were trying every possible
> >>> way to break userspace
> >> How could my suggestions break a userspace?
> >>
> >>> or try to approximate to the same brokenness
> >>> mlx5_vdpa may have caused to the userspace. What we will get eventually
> >>> from these lengthy discussions?
> >> Siwei, I'd really suggest you read the link I gave you. You may get
> >> the answer. What's more, It doesn't cost too much then we know for
> >> sure there would not be any issue, why not choose the hard way?
> >>
> >>> On the other hand, if you think it from
> >>> vhost-vdpa user perspective, you'll clearly see there's just a couple of
> >>> ways to unbreak userspace from the internal broken map which is out of
> >>> sync with vhost-vdpa iotlb after device reset.
> >> Patches are more than welcomed.
> >>
> >>> If this brokenness was
> >>> something universally done from the vhost-vdpa layer itself, I'd feel
> >>> it's more of a shared problem, but this is not the case I see it here.
> >>> While the long standing mlx5_vdpa/vdpa_sim issue is 100% misuse of
> >>> .reset op in a wrong way per IOMMU API definition. Why leaving this
> >>> discrepancy to the individual driver is not even an option, I'm still
> >>> not sure?
> >> Sorry? I start with a switch in the driver, and then I try to avoid
> >> that. And it seems you don't want a burden on the driver as well.
> >> Where did you see I say we can't do that in the driver? What I
> >> disagree with is to use a module parameter.
> >>
> >> Even if I fail, it doesn't mean we can't do that in the driver code.
> >> If you read the link[1] you can see the offending commit is a change
> >> in uvcvideo driver.
> >>
> >> Thanks
> >>
> >>>
> >>> Thanks,
> >>> -Siwei
> >>>
> >>>>> Then we need a set of device flags (backend_features
> >>>>> bit again?) to indicate the specific driver needs upper layer's help on
> >>>>> old-behaviour emulation.
> >>>>>
> >>>>> Last but not least, I'm not sure how to properly emulate
> >>>>> reset_vendor_mappings() from vhost-vdpa layer. If a vendor driver has no
> >>>>> .reset_map op implemented, or if .reset_map has a slightly different
> >>>>> implementation than what it used to reset the iotlb in the .reset op,
> >>>> See above, for reset_vendor_mappings() I meant config->reset_map() exactly.
> >>>>
> >>>> Thanks
> >>>>
> >>>>> then this either becomes effectively dead code if no one ends up using,
> >>>>> or the vhost-vdpa emulation is helpless and limited in scope, unable to
> >>>>> cover all the cases.
> >>>>>
> >>>>> ----------------%<----------------%<----------------
> >>>>>
>