Re: [PATCH v11 0/8] KVM: allow mapping non-refcounted pages

From: Christian König
Date: Wed Mar 13 2024 - 11:09:30 EST


Sending that once more as text only since AMD eMail has messed that up once more.

Regards,
Christian.

Am 13.03.24 um 16:07 schrieb Christian König:
Am 13.03.24 um 15:48 schrieb Sean Christopherson:
On Wed, Mar 13, 2024, Christian König wrote:
Am 13.03.24 um 14:34 schrieb Sean Christopherson:
On Wed, Mar 13, 2024, Christian König wrote:
And when you have either of those two functionalities the requirement to add
a long term reference to the struct page goes away completely. So when this
is done right you don't need to grab a reference in the first place.
The KVM issue that this series is solving isn't that KVM grabs a reference, it's
that KVM assumes that any non-reserved pfn that is backed by "struct page" is
refcounted.
Well why does it assumes that? When you have a MMU notifier that seems
unnecessary.
Indeed, it's legacy code that we're trying to clean up. It's the bulk of this
series.

Yeah, that is the right approach as far as I can see.

What Christoph is objecting to is that, in this series, KVM is explicitly adding
support for mapping non-compound (huge)pages into KVM guests. David is arguing
that Christoph's objection to _KVM_ adding support is unfair, because the real
problem is that the kernel already maps such pages into host userspace. I.e. if
the userspace mapping ceases to exist, then there are no mappings for KVM to follow
and propagate to KVM's stage-2 page tables.
And I have to agree with Christoph that this doesn't make much sense. KVM
should *never* map (huge) pages from VMAs marked with VM_PFNMAP into KVM
guests in the first place.

What it should do instead is to mirror the PFN from the host page tables
into the guest page tables.
That's exactly what this series does. Christoph is objecting to KVM playing nice
with non-compound hugepages, as he feels that such mappings should not exist
*anywhere*.

Well Christoph is right those mappings shouldn't exists and they also don't exists.

What happens here is that a driver has allocated some contiguous memory to do DMA with. And then some page table is pointing to a PFN inside that memory because userspace needs to provide parameters for the DMA transfer.

This is *not* a mapping of a non-compound hugepage, it's simply a PTE pointing to some PFN. It can trivially be that userspace only maps 4KiB of some 2MiB piece of memory the driver has allocate.

I.e. Christoph is (implicitly) saying that instead of modifying KVM to play nice,
we should instead fix the TTM allocations. And David pointed out that that was
tried and got NAK'd.

Well as far as I can see Christoph rejects the complexity coming with the approach of sometimes grabbing the reference and sometimes not. And I have to agree that this is extremely odd.

What the KVM code should do instead is to completely drop grabbing references to struct pages, no matter what the VMA flags are.

Regards,
Christian.