Re: [PATCH V5 4/4] kvm: add a check if pfn is from NVDIMM pmem.

From: Dan Williams
Date: Tue Sep 18 2018 - 22:53:47 EST


On Fri, Sep 7, 2018 at 2:25 AM Zhang Yi <yi.z.zhang@xxxxxxxxxxxxxxx> wrote:
>
> For device specific memory space, when we move these area of pfn to
> memory zone, we will set the page reserved flag at that time, some of
> these reserved for device mmio, and some of these are not, such as
> NVDIMM pmem.
>
> Now, we map these dev_dax or fs_dax pages to kvm for DIMM/NVDIMM
> backend, since these pages are reserved, the check of
> kvm_is_reserved_pfn() misconceives those pages as MMIO. Therefor, we
> introduce 2 page map types, MEMORY_DEVICE_FS_DAX/MEMORY_DEVICE_DEV_DAX,
> to identify these pages are from NVDIMM pmem and let kvm treat these
> as normal pages.
>
> Without this patch, many operations will be missed due to this
> mistreatment to pmem pages, for example, a page may not have chance to
> be unpinned for KVM guest(in kvm_release_pfn_clean), not able to be
> marked as dirty/accessed(in kvm_set_pfn_dirty/accessed) etc.
>
> Signed-off-by: Zhang Yi <yi.z.zhang@xxxxxxxxxxxxxxx>
> Acked-by: Pankaj Gupta <pagupta@xxxxxxxxxx>
> ---
> virt/kvm/kvm_main.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c44c406..9c49634 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -147,8 +147,20 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>
> bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
> {
> - if (pfn_valid(pfn))
> - return PageReserved(pfn_to_page(pfn));
> + struct page *page;
> +
> + if (pfn_valid(pfn)) {
> + page = pfn_to_page(pfn);
> +
> + /*
> + * For device specific memory space, there is a case
> + * which we need pass MEMORY_DEVICE_FS[DEV]_DAX pages
> + * to kvm, these pages marked reserved flag as it is a
> + * zone device memory, we need to identify these pages
> + * and let kvm treat these as normal pages
> + */
> + return PageReserved(page) && !is_dax_page(page);

Should we consider just not setting PageReserved for
devm_memremap_pages()? Perhaps kvm is not be the only component making
these assumptions about this flag?

Why is MEMORY_DEVICE_PUBLIC memory specifically excluded?

This has less to do with "dax" pages and more to do with
devm_memremap_pages() established ranges. P2PDMA is another producer
of these pages. If either MEMORY_DEVICE_PUBLIC or P2PDMA pages can be
used in these kvm paths then I think this points to consider clearing
the Reserved flag.

That said I haven't audited all the locations that test PageReserved().

Sorry for not responding sooner I was on extended leave.