Re: [PATCH v6 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings

From: Matthew Rosato
Date: Tue May 02 2023 - 14:00:25 EST


On 5/2/23 1:46 PM, Jason Gunthorpe wrote:
> On Tue, May 02, 2023 at 06:32:23PM +0200, David Hildenbrand wrote:
>> On 02.05.23 18:19, Jason Gunthorpe wrote:
>>> On Tue, May 02, 2023 at 06:12:39PM +0200, David Hildenbrand wrote:
>>>
>>>>> It missses the general architectural point why we have all these
>>>>> shootdown mechanims in other places - plares are not supposed to make
>>>>> these kinds of assumptions. When the userspace unplugs the memory from
>>>>> KVM or unmaps it from VFIO it is not still being accessed by the
>>>>> kernel.
>>>>
>>>> Yes. Like having memory in a vfio iommu v1 and doing the same (mremap,
>>>> munmap, MADV_DONTNEED, ...). Which is why we disable MADV_DONTNEED (e.g.,
>>>> virtio-balloon) in QEMU with vfio.
>>>
>>> That is different, VFIO has it's own contract how it consumes the
>>> memory from the MM and VFIO breaks all this stuff.
>>>
>>> But when you tell VFIO to unmap the memory it doesn't keep accessing
>>> it in the background like this does.
>>
>> To me, this is similar to when QEMU (user space) triggers
>> KVM_S390_ZPCIOP_DEREG_AEN, to tell KVM to disable AIF and stop using the
>> page (1) When triggered by the guest explicitly (2) when resetting the VM
>> (3) when resetting the virtual PCI device / configuration.
>>
>> Interrupt gets unregistered from HW (which stops using the page), the pages
>> get unpinned. Pages get no longer used.
>>
>> I guess I am still missing (a) how this is fundamentally different (b) how
>> it could be done differently.
>
> It uses an address that is already scoped within the KVM memory map
> and uses KVM's gpa_to_gfn() to translate it to some pinnable page
>
> It is not some independent thing like VFIO, it is explicitly scoped
> within the existing KVM structure and it does not follow any mutations
> that are done to the gpa map through the usual KVM APIs.
>
>> I'd really be happy to learn how a better approach would look like that does
>> not use longterm pinnings.
>
> Sounds like the FW sadly needs pinnings. This is why I said it looks
> like DMA. If possible it would be better to get the pinning through
> VFIO, eg as a mdev

Hrm, these are today handled as a vfio-pci device (e.g. no mdev) so that would be a pretty significant change.

>
> Otherwise, it would have been cleaner if this was divorced from KVM
> and took in a direct user pointer, then maybe you could make the
> argument is its own thing with its own lifetime rules. (then you are
> kind of making your own mdev)

Problem there is that firmware needs both the location (where to indicate the event) and the identity of the KVM instance (what guest to ship the GISA alert to) so I don't think we can completely divorce them.

>
> Or, perhaps, this is really part of some radical "irqfd" that we've
> been on and off talking about specifically to get this area of
> interrupt bypass uAPI'd properly..

I investigated that at one point and could not seem to get it to fit; I'll have another look there.