Re: [PATCH v6 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings

From: Jason Gunthorpe
Date: Tue May 02 2023 - 11:21:07 EST


On Tue, May 02, 2023 at 10:54:35AM -0400, Matthew Rosato wrote:
> On 5/2/23 10:15 AM, David Hildenbrand wrote:
> > On 02.05.23 16:04, Jason Gunthorpe wrote:
> >> On Tue, May 02, 2023 at 03:57:30PM +0200, David Hildenbrand wrote:
> >>> On 02.05.23 15:50, Jason Gunthorpe wrote:
> >>>> On Tue, May 02, 2023 at 03:47:43PM +0200, David Hildenbrand wrote:
> >>>>>> Eventually we want to implement a mechanism where we can dynamically pin in response to RPCIT.
> >>>>>
> >>>>> Okay, so IIRC we'll fail starting the domain early, that's good. And if we
> >>>>> pin all guest memory (instead of small pieces dynamically), there is little
> >>>>> existing use for file-backed RAM in such zPCI configurations (because memory
> >>>>> cannot be reclaimed either way if it's all pinned), so likely there are no
> >>>>> real existing users.
> >>>>
> >>>> Right, this is VFIO, the physical HW can't tolerate not having pinned
> >>>> memory, so something somewhere is always pinning it.
> >>>>
> >>>> Which, again, makes it weird/wrong that this KVM code is pinning it
> >>>> again :\
> >>>
> >>> IIUC, that pinning is not for ordinary IOMMU / KVM memory access. It's for
> >>> passthrough of (adapter) interrupts.
> >>>
> >>> I have to speculate, but I guess for hardware to forward interrupts to the
> >>> VM, it has to pin the special guest memory page that will receive the
> >>> indications, to then configure (interrupt) hardware to target the interrupt
> >>> indications to that special guest page (using a host physical address).
> >>
> >> Either the emulated access is "CPU" based happening through the KVM
> >> page table so it should use mmu_notifier locking.
> >>
> >> Or it is "DMA" and should go through an IOVA through iommufd pinning
> >> and locking.
> >>
> >> There is no other ground, nothing in KVM should be inventing its own
> >> access methodology.
> >
> > I might be wrong, but this seems to be a bit different.
> >
> > It cannot tolerate page faults (needs a host physical address), so
> > memory notifiers don't really apply. (as a side note, KVM on s390x
> > does not use mmu notifiers as we know them)
>
> The host physical address is one shared between underlying firmware
> and the host kvm. Either might make changes to the referenced page
> and then issue an alert to the guest via a mechanism called GISA,
> giving impetus to the guest to look at that page and process the
> event. As you say, firmware can't tolerate the page being
> unavailable; it's expecting that once we feed it that location it's
> always available until we remove it (kvm_s390_pci_aif_disable).

That is a CPU access delegated to the FW without any locking scheme to
make it safe with KVM :\

It would have been better if FW could inject it through the kvm page
tables so it has some coherency.

Otherwise you have to call this "DMA", I think.

How does s390 avoid mmu notifiers without having lots of problems?? It
is not really optional to hook the invalidations if you need to build
a shadow page table..

Jason