Re: [RFC PATCH 00/19] hugetlb support for KVM guest_mem

From: Isaku Yamahata
Date: Thu Jun 08 2023 - 00:38:22 EST


On Tue, Jun 06, 2023 at 07:03:45PM +0000,
Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote:

> Hello,
>
> This patchset builds upon a soon-to-be-published WIP patchset that Sean
> published at https://github.com/sean-jc/linux/tree/x86/kvm_gmem_solo, mentioned
> at [1].
>
> The tree can be found at:
> https://github.com/googleprodkernel/linux-cc/tree/gmem-hugetlb-rfc-v1
>
> In this patchset, hugetlb support for KVM's guest_mem (aka gmem) is introduced,
> allowing VM private memory (for confidential computing) to be backed by hugetlb
> pages.
>
> guest_mem provides userspace with a handle, with which userspace can allocate
> and deallocate memory for confidential VMs without mapping the memory into
> userspace.
>
> Why use hugetlb instead of introducing a new allocator, like gmem does for 4K
> and transparent hugepages?
>
> + hugetlb provides the following useful functionality, which would otherwise
> have to be reimplemented:
> + Allocation of hugetlb pages at boot time, including
> + Parsing of kernel boot parameters to configure hugetlb
> + Tracking of usage in hstate
> + gmem will share the same system-wide pool of hugetlb pages, so users
> don't have to have separate pools for hugetlb and gmem
> + Page accounting with subpools
> + hugetlb pages are tracked in subpools, which gmem uses to reserve
> pages from the global hstate
> + Memory charging
> + hugetlb provides code that charges memory to cgroups
> + Reporting: hugetlb usage and availability are available at /proc/meminfo,
> etc
>
> The first 11 patches in this patchset is a series of refactoring to decouple
> hugetlb and hugetlbfs.
>
> The central thread binding the refactoring is that some functions (like
> inode_resv_map(), inode_subpool(), inode_hstate(), etc) rely on a hugetlbfs
> concept, that the resv_map, subpool, hstate, are in a specific field in a
> hugetlb inode.
>
> Refactoring to parametrize functions by hstate, subpool, resv_map will allow
> hugetlb to be used by gmem and in other places where these data structures
> aren't necessarily stored in the same positions in the inode.
>
> The refactoring proposed here is just the minimum required to get a
> proof-of-concept working with gmem. I would like to get opinions on this
> approach before doing further refactoring. (See TODOs)
>
> TODOs:
>
> + hugetlb/hugetlbfs refactoring
> + remove_inode_hugepages() no longer needs to be exposed, it is hugetlbfs
> specific and used only in inode.c
> + remove_mapping_hugepages(), remove_inode_single_folio(),
> hugetlb_unreserve_pages() shouldn't need to take inode as a parameter
> + Updating inode->i_blocks can be refactored to a separate function and
> called from hugetlbfs and gmem
> + alloc_hugetlb_folio_from_subpool() shouldn't need to be parametrized by
> vma
> + hugetlb_reserve_pages() should be refactored to be symmetric with
> hugetlb_unreserve_pages()
> + It should be parametrized by resv_map
> + alloc_hugetlb_folio_from_subpool() could perhaps use
> hugetlb_reserve_pages()?
> + gmem
> + Figure out if resv_map should be used by gmem at all
> + Probably needs more refactoring to decouple resv_map from hugetlb
> functions

Hi. If kvm gmem is compiled as kernel module, many symbols are failed to link.
You need to add EXPORT_SYMBOL{,_GPL} for exported symbols.
Or compile it to kernel instead of module?

Thanks,

> Questions for the community:
>
> 1. In this patchset, every gmem file backed with hugetlb is given a new
> subpool. Is that desirable?
> + In hugetlbfs, a subpool always belongs to a mount, and hugetlbfs has one
> mount per hugetlb size (2M, 1G, etc)
> + memfd_create(MFD_HUGETLB) effectively returns a full hugetlbfs file, so it
> (rightfully) uses the hugetlbfs kernel mounts and their subpools
> + I gave each file a subpool mostly to speed up implementation and still be
> able to reserve hugetlb pages from the global hstate based on the gmem
> file size.
> + gmem, unlike hugetlbfs, isn't meant to be a full filesystem, so
> + Should there be multiple mounts, one for each hugetlb size?
> + Will the mounts be initialized on boot or on first gmem file creation?
> + Or is one subpool per gmem file fine?
> 2. Should resv_map be used for gmem at all, since gmem doesn't allow userspace
> reservations?
>
> [1] https://lore.kernel.org/lkml/ZEM5Zq8oo+xnApW9@xxxxxxxxxx/
>
> ---
>
> Ackerley Tng (19):
> mm: hugetlb: Expose get_hstate_idx()
> mm: hugetlb: Move and expose hugetlbfs_zero_partial_page
> mm: hugetlb: Expose remove_inode_hugepages
> mm: hugetlb: Decouple hstate, subpool from inode
> mm: hugetlb: Allow alloc_hugetlb_folio() to be parametrized by subpool
> and hstate
> mm: hugetlb: Provide hugetlb_filemap_add_folio()
> mm: hugetlb: Refactor vma_*_reservation functions
> mm: hugetlb: Refactor restore_reserve_on_error
> mm: hugetlb: Use restore_reserve_on_error directly in filesystems
> mm: hugetlb: Parametrize alloc_hugetlb_folio_from_subpool() by
> resv_map
> mm: hugetlb: Parametrize hugetlb functions by resv_map
> mm: truncate: Expose preparation steps for truncate_inode_pages_final
> KVM: guest_mem: Refactor kvm_gmem fd creation to be in layers
> KVM: guest_mem: Refactor cleanup to separate inode and file cleanup
> KVM: guest_mem: hugetlb: initialization and cleanup
> KVM: guest_mem: hugetlb: allocate and truncate from hugetlb
> KVM: selftests: Add basic selftests for hugetlbfs-backed guest_mem
> KVM: selftests: Support various types of backing sources for private
> memory
> KVM: selftests: Update test for various private memory backing source
> types
>
> fs/hugetlbfs/inode.c | 102 ++--
> include/linux/hugetlb.h | 86 ++-
> include/linux/mm.h | 1 +
> include/uapi/linux/kvm.h | 25 +
> mm/hugetlb.c | 324 +++++++-----
> mm/truncate.c | 24 +-
> .../testing/selftests/kvm/guest_memfd_test.c | 33 +-
> .../testing/selftests/kvm/include/test_util.h | 14 +
> tools/testing/selftests/kvm/lib/test_util.c | 74 +++
> .../kvm/x86_64/private_mem_conversions_test.c | 38 +-
> virt/kvm/guest_mem.c | 488 ++++++++++++++----
> 11 files changed, 882 insertions(+), 327 deletions(-)
>
> --
> 2.41.0.rc0.172.g3f132b7071-goog

--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>