Re: [patch 0/7] x86/kvmclock: Remove memblock dependency and further cleanups

From: Brijesh Singh
Date: Fri Jul 06 2018 - 19:51:31 EST



Adding Tom and Boris


On 7/6/18 12:47 PM, Paolo Bonzini wrote:
> On 06/07/2018 18:13, Thomas Gleixner wrote:
>> To allow early utilization of kvmclock it is required to remove the
>> memblock dependency. memblock is currently used to allocate the per
>> cpu data for kvmclock.
>>
>> The first patch replaces the memblock with a static array sized 64bytes *
>> NR_CPUS and was posted by Pavel. That patch allocates everything statically
>> which is a waste when kvmclock is not used.
>>
>> The rest of the series cleans up the code and converts it to per cpu
>> variables but does not put the kvmclock data into the per cpu area as that
>> has an issue vs. mapping the boot cpu data into the VDSO (leaks arbitrary
>> data, unless page sized).
>>
>> The per cpu data consists of pointers to the actual data. For the boot cpu
>> a page sized array is statically allocated which can be mapped into the
>> VDSO. That array is used for initializing the first 64 CPU pointers. If
>> there are more CPUs the pvclock data is allocated during CPU bringup.
>>
>> So this still will have some overhead when kvmclock is not in use, but
>> bringing it down to zero would be a massive trainwreck and even more
>> indirections.
>>
>> Thanks,
>>
>> tglx
>>
>> 8<--------------
>> a/arch/x86/include/asm/kvm_guest.h | 7
>> arch/x86/include/asm/kvm_para.h | 1
>> arch/x86/kernel/kvm.c | 14 -
>> arch/x86/kernel/kvmclock.c | 262 ++++++++++++++-----------------------
>> arch/x86/kernel/setup.c | 4
>> 5 files changed, 105 insertions(+), 183 deletions(-)
>>
>>
>>
>>
> Thanks, this is really nice. With the small changes from my review,
>
> Acked-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>

Hi Paolo and Thomas,


This series breaks SEV guest support. The physical address of both
wall_clock and hv_clock is shared with hypervisor for updates. In case
of SEV the address must be mapped as 'decrypted (i.e C=0)' so that both
guest and HV can access the data correctly. The follow patch should map
the pages as decrypted.


diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 890e9e5..640c796 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -251,6 +251,20 @@ static void kvm_shutdown(void)
ÂÂÂÂÂÂÂ native_machine_shutdown();
Â}
Â
+static void sev_map_clocks_decrypted(void)
+{
+ÂÂÂÂÂÂ if (!sev_active())
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return;
+
+ÂÂÂÂÂÂ /*
+ÂÂÂÂÂÂÂ * wall_clock and hv_clock addresses are shared with hypervisor.
+ÂÂÂÂÂÂÂ * When SEV is enabled, any addresses shared with hypervisor must be
+ÂÂÂÂÂÂÂ * mapped decrypted.
+ÂÂÂÂÂÂÂ */
+ÂÂÂÂÂÂ early_set_memory_decrypted((unsigned long) wall_clock,
WALL_CLOCK_SIZE);
+ÂÂÂÂÂÂ early_set_memory_decrypted((unsigned long) hv_clock, HV_CLOCK_SIZE);
+}
+
Âvoid __init kvmclock_init(void)
Â{
ÂÂÂÂÂÂÂ struct pvclock_vcpu_time_info *vcpu_time;
@@ -269,6 +283,8 @@ void __init kvmclock_init(void)
ÂÂÂÂÂÂÂ wall_clock = (struct pvclock_wall_clock *)wall_clock_mem;
ÂÂÂÂÂÂÂ hv_clock = (struct pvclock_vsyscall_time_info *)hv_clock_mem;
Â
+ÂÂÂÂÂÂ sev_map_clocks_decrypted();
+
ÂÂÂÂÂÂÂ if (kvm_register_clock("primary cpu clock")) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ hv_clock = NULL;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ wall_clock = NULL;


But this patch triggers the below kernel crash.
early_set_memory_decrypted() uses kernel_physical_mapping_init() to
split the large pages and clear the C-bit. It seems this function still
has dependency with memblock.

[ÂÂÂ 0.000000] Hypervisor detected: KVM
[ÂÂÂ 0.000000] Kernel panic - not syncing: alloc_low_pages: ran out of
memory
[ÂÂÂ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18.0-rc3-sev #19
[ÂÂÂ 0.000000] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
0.0.0 02/06/2015
[ÂÂÂ 0.000000] Call Trace:
[ÂÂÂ 0.000000]Â ? dump_stack+0x5c/0x80
[ÂÂÂ 0.000000]Â ? panic+0xe7/0x247
[ÂÂÂ 0.000000]Â ? alloc_low_pages+0x130/0x130
[ÂÂÂ 0.000000]Â ? kernel_physical_mapping_init+0xe0/0x204
[ÂÂÂ 0.000000]Â ? early_set_memory_enc_dec+0x10f/0x160
[ÂÂÂ 0.000000]Â ? 0xffffffffb1000000
[ÂÂÂ 0.000000]Â ? kvmclock_init+0x83/0x20a
[ÂÂÂ 0.000000]Â ? setup_arch+0x42c/0xce6
[ÂÂÂ 0.000000]Â ? start_kernel+0x67/0x531
[ÂÂÂ 0.000000]Â ? load_ucode_bsp+0x76/0x12e
[ÂÂÂ 0.000000]Â ? secondary_startup_64+0xa5/0xb0
[ÂÂÂ 0.000000] ---[ end Kernel panic - not syncing: alloc_low_pages: ran
out of memory ]---

- Brijesh