Re: Rename restrictedmem => guardedmem? (was: Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM)

From: Sean Christopherson
Date: Tue Jun 06 2023 - 19:25:54 EST


On Tue, Jun 06, 2023, Ackerley Tng wrote:
>
> I've ported selftests from Chao and I [1] while working on hugetlb support
> for guest_mem [2].
>
> In the process, I found some bugs and have some suggestions for guest_mem.
> Please see separate commits at [3].
>
> Here are some highlights/questions:
>
> + "KVM: guest_mem: Explain the use of the uptodate flag for gmem"
> + Generally, uptodate flags means that the contents of this page match the
> backing store. Since gmem is memory-backed, does "uptodate" for gmem mean
> "zeroed"?

Don't read too much into the code, my POC was very much a "beat on it until it
works" scenario.

> + "KVM: guest_mem: Don't re-mark accessed after getting a folio" and "KVM:
> guest_mem: Don't set dirty flag for folio"
> + Do we need to folio_mark_accessed(), when it was created with
> FGP_ACCESSED?

Probably not. And as you note below, it's all pretty nonsensical anyways.

> + What is the significance of these LRU flags when gmem doesn't support
> swapping/eviction?

Likely none. I used the filemap APIs in my POC because it was easy, not because
it was necessarily the best approach, i.e. that the folios/pages show up in the
LRUs is an unwanted side effect, not a feature. If guest_memfd only needs a small
subset of the filemap support, going with a true from-scratch implemenation on
top of xarray might be cleaner overall, e.g. would avoid the need for a new flag
to say "this folio can't be migrated even though it's on the LRUs".

> + "KVM: guest_mem: Align so that at least 1 page is allocated"
> + Bug in current implementation: without this alignment, fallocate() of
> a size less than the gmem page size will result in no allocation at all

I'm not convinced this is a bug. I don't see any reason to allow allocating and
punching holes in sub-page granularity.

> + Both shmem and hugetlbfs perform this alignment
> + "KVM: guest_mem: Add alignment checks"
> + Implemented the alignment checks for guest_mem because hugetlb on gmem
> would hit a BUG_ON without this check
> + "KVM: guest_mem: Prevent overflows in kvm_gmem_invalidate_begin()"
> + Sean fixed a bug in the offset-to-gfn conversion in
> kvm_gmem_invalidate_begin() earlier, adding a WARN_ON_ONCE()

As Mike pointed out, there's likely still a bug here[*]. I was planning on
diving into that last week, but that never happened. If you or anyone else can
take a peek and/or write a testcase, that would be awesome.

: Heh, only if there's a testcase for it. Assuming start >= the slot offset does
: seem broken, e.g. if the range-to-invalidate overlaps multiple slots, later slots
: will have index==slot->gmem.index > start.
:
: > Since 'index' corresponds to the gmem offset of the current slot, is there any
: > reason not to do something like this?:
: >
: > .start = slot->base_gfn + index - slot->gmem.index,
: >
: > But then, if that's the case, wouldn't index == slot->gmem.index? Suggesting
: > we case just simplify to this?:
: >
: > .start = slot->base_gfn,
:
: No, e.g. if start is partway through a memslot, there's no need to invalidate
: the entire memslot. I'll stare at this tomorrow when my brain is hopefully a
: bit more functional, I suspect there is a min() and/or max() needed somewhere.

[*] https://lore.kernel.org/all/20230512002124.3sap3kzxpegwj3n2@xxxxxxx

> + Code will always hit WARN_ON_ONCE() when the entire file is closed and
> all offsets are invalidated, so WARN_ON_ONCE() should be removed
> + Vishal noticed that the conversion might result in an overflow, so I
> fixed that
> + And of course, hugetlb support! Please let me know what you think of the
> approach proposed at [2].
>
> [1] https://lore.kernel.org/all/cover.1678926164.git.ackerleytng@xxxxxxxxxx/T/
> [2] https://lore.kernel.org/lkml/cover.1686077275.git.ackerleytng@xxxxxxxxxx/T/
> [3] https://github.com/googleprodkernel/linux-cc/tree/gmem-hugetlb-rfc-v1