Re: [PATCH v6 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings

From: David Hildenbrand
Date: Tue May 02 2023 - 10:58:09 EST


On 02.05.23 15:35, Matthew Rosato wrote:
On 5/2/23 9:04 AM, Christian Borntraeger wrote:


Am 02.05.23 um 14:54 schrieb Lorenzo Stoakes:
On Tue, May 02, 2023 at 02:46:28PM +0200, Christian Borntraeger wrote:
Am 02.05.23 um 01:11 schrieb Lorenzo Stoakes:
Writing to file-backed dirty-tracked mappings via GUP is inherently broken
as we cannot rule out folios being cleaned and then a GUP user writing to
them again and possibly marking them dirty unexpectedly.

This is especially egregious for long-term mappings (as indicated by the
use of the FOLL_LONGTERM flag), so we disallow this case in GUP-fast as
we have already done in the slow path.

Hmm, does this interfer with KVM on s390 and PCI interpretion of interrupt delivery?
It would no longer work with file backed memory, correct?

See
arch/s390/kvm/pci.c

kvm_s390_pci_aif_enable
which does have
FOLL_WRITE | FOLL_LONGTERM
to


Does this memory map a dirty-tracked file? It's kind of hard to dig into where
the address originates from without going through a ton of code. In worst case
if the fast code doesn't find a whitelist it'll fall back to slow path which
explicitly checks for dirty-tracked filesystem.

It does pin from whatever QEMU uses as backing for the guest.

We can reintroduce a flag to permit exceptions if this is really broken, are you
able to test? I don't have an s390 sat around :)

Matt (Rosato on cc) probably can. In the end, it would mean having
  <memoryBacking>
    <source type="file"/>
  </memoryBacking>

In libvirt I guess.

I am running with this series applied using a QEMU guest with memory-backend-file (using the above libvirt snippet) for a few different PCI device types and AEN forwarding (e.g. what is setup in kvm_s390_pci_aif_enable) is still working.


That's ... unexpected. :)

Either this series doesn't work as expected or you end up using a filesystem that is still compatible. But I guess most applicable filesystems (ext4, btrfs, xfs) all have a page_mkwrite callback and should, therefore, disallow long-term pinning with this series.

--
Thanks,

David / dhildenb