Re: [PATCH] selftests: kvm: fix overlapping addresses in memslot_perf_test

From: Maciej S. Szmigiero
Date: Thu Jun 03 2021 - 09:05:46 EST


On 03.06.2021 14:37, Andrew Jones wrote:
On Thu, Jun 03, 2021 at 05:26:33AM +0000, Duan, Zhenzhong wrote:
-----Original Message-----
From: Maciej S. Szmigiero <maciej.szmigiero@xxxxxxxxxx>
Sent: Thursday, June 3, 2021 7:07 AM
To: Paolo Bonzini <pbonzini@xxxxxxxxxx>; Duan, Zhenzhong
<zhenzhong.duan@xxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; Andrew Jones
<drjones@xxxxxxxxxx>
Subject: Re: [PATCH] selftests: kvm: fix overlapping addresses in
memslot_perf_test

On 30.05.2021 01:13, Maciej S. Szmigiero wrote:
On 29.05.2021 12:20, Paolo Bonzini wrote:
On 28/05/21 21:51, Maciej S. Szmigiero wrote:
On 28.05.2021 21:11, Paolo Bonzini wrote:
The memory that is allocated in vm_create is already mapped close
to GPA 0, because test_execute passes the requested memory to
prepare_vm.  This causes overlapping memory regions and the test
crashes.  For simplicity just move MEM_GPA higher.

Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>

I am not sure that I understand the issue correctly, is
vm_create_default() already reserving low GPAs (around 0x10000000)
on some arches or run environments?

It maps the number of pages you pass in the second argument, see
vm_create.

   if (phy_pages != 0)
     vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
                                 0, 0, phy_pages, 0);

In this case:

   data->vm = vm_create_default(VCPU_ID, mempages, guest_code);

called here:

   if (!prepare_vm(data, nslots, maxslots, tdata->guest_code,
                   mem_size, slot_runtime)) {

where mempages is mem_size, which is declared as:

         uint64_t mem_size = tdata->mem_size ? : MEM_SIZE_PAGES;

but actually a better fix is just to pass a small fixed value (e.g.
1024) to vm_create_default, since all other regions are added by hand

Yes, but the argument that is passed to vm_create_default() (mem_size
in the case of the test) is not passed as phy_pages to vm_create().
Rather, vm_create_with_vcpus() calculates some upper bound of extra
memory that is needed to cover that much guest memory (including for
its page tables).

The biggest possible mem_size from memslot_perf_test is 512 MiB + 1
page, according to my calculations this results in phy_pages of 1029
(~4 MiB) in the x86-64 case and around 1540 (~6 MiB) in the s390x case
(here I am not sure about the exact number, since s390x has some
additional alignment requirements).

Both values are well below 256 MiB (0x10000000UL), so I was wondering
what kind of circumstances can make these allocations collide (maybe I
am missing something in my analysis).

I see now that there has been a patch merged last week called
"selftests: kvm: make allocation of extra memory take effect" by Zhenzhong
that now allocates also the whole memory size passed to
vm_create_default() (instead of just page tables for that much memory).

The commit message of this patch says that "perf_test_util and
kvm_page_table_test use it to alloc extra memory currently", however both
kvm_page_table_test and lib/perf_test_util framework explicitly add the
required memory allocation by doing a vm_userspace_mem_region_add()
call for the same memory size that they pass to vm_create_default().

So now they allocate this memory twice.

@Zhenzhong: did you notice improper operation of either
kvm_page_table_test or perf_test_util-based tests (demand_paging_test,
dirty_log_perf_test,
memslot_modification_stress_test) before your patch?
No


They seem to work fine for me without the patch (and I guess other people
would have noticed earlier, too, if they were broken).

After this patch not only these tests allocate their memory twice but it is
harder to make vm_create_default() allocate the right amount of memory for
the page tables in cases where the test needs to explicitly use
vm_userspace_mem_region_add() for its allocations (because it wants the
allocation placed at a specific GPA or in a specific memslot).

One has to basically open-code the page table size calculations from
vm_create_with_vcpus() in the particular test then, taking also into account
that vm_create_with_vcpus() will not only allocate the passed memory size
(calculated page tables size) but also behave like it was allocating space for
page tables for these page tables (even though the passed memory size itself
is supposed to cover them).
Looks we have different understanding to the parameter extra_mem_pages of vm_create_default().

In your usage, extra_mem_pages is only used for page table calculations, real extra memory allocation
happens in the extra call of vm_userspace_mem_region_add().

Yes, this is the meaning that kvm selftests has always had for
extra_mem_pages of vm_create_default(). If we'd rather have a different
meaning, that's fine, but we need to change all the callers of the
function as well.

If we change the meaning of extra_mem_pages (keep the patch) it would be
good to still have an additional parameter to vm_create_with_vcpus() for
tests that have to allocate their memory on their own via
vm_userspace_mem_region_add() for vm_create_with_vcpus() to just allocate
the page tables for these manual allocations.
Or a helper to calculate the required extra_mem_pages for them.
If we decide to leave vm_create_default() the way it was by reverting this
patch, then maybe we should consider renaming the parameter and/or
documenting the function.

Adding a descriptive comment (and possibly renaming the parameter) seems
like a much simpler solution to me that adapting these tests (and possibly
adding the parameter or helper described above for them).

Thanks,
drew

Thanks,
Maciej


In my understanding, extra_mem_pages is used for a VM who wants a custom memory size in slot0,
rather than the default DEFAULT_GUEST_PHY_PAGES size.

I understood your comments and do agree that my patch bring some trouble to your code, sorry for that.
I'm fine to revert that patch and I think it's better to let the maintainers to decide what extra_mem_pages
Is used for.

Thanks
Zhenzhong

Due to the above, I suspect the previous behavior was, in fact, correct.

Thanks,
Maciej