RE: [PATCH] x86/Hyper-V: Initialize Hyper-V stimer after enabling lapic

From: Michael Kelley
Date: Thu Aug 05 2021 - 13:35:33 EST


From: Tianyu Lan <ltykernel@xxxxxxxxx> Sent: Wednesday, August 4, 2021 11:49 AM
>
> Hyper-V Isolation VM doesn't have PIT/HPET legacy timer and only
> provide stimer. Initialize Hyper-v stimer just after enabling
> lapic to avoid kernel stuck during calibrating TSC due to no
> available timer.

I'm unclear on the core reason this patch is needed. Hyper-V
Generation 2 VMs don't have PIT or HPET either, and they don't get
stuck in TSC calibration. Instead of calibration, Hyper-V provides
guests with a synthetic MSR to directly read the TSC frequency.
Code in ms_hyperv_init_platform() sets x86_platform.calibrate_tsc
to hv_get_tsc_khz(), which reads the synthetic MSR. Is some
part of this mechanism not available in a Hyper-V Isolated VM?

>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx>
> ---
> arch/x86/hyperv/hv_init.c | 29 -----------------------------
> arch/x86/kernel/cpu/mshyperv.c | 22 ++++++++++++++++++++++
> 2 files changed, 22 insertions(+), 29 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 6f247e7e07eb..4a643a85d570 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -271,25 +271,6 @@ static struct syscore_ops hv_syscore_ops = {
> .resume = hv_resume,
> };
>
> -static void (* __initdata old_setup_percpu_clockev)(void);
> -
> -static void __init hv_stimer_setup_percpu_clockev(void)
> -{
> - /*
> - * Ignore any errors in setting up stimer clockevents
> - * as we can run with the LAPIC timer as a fallback.
> - */
> - (void)hv_stimer_alloc(false);
> -
> - /*
> - * Still register the LAPIC timer, because the direct-mode STIMER is
> - * not supported by old versions of Hyper-V. This also allows users
> - * to switch to LAPIC timer via /sys, if they want to.
> - */
> - if (old_setup_percpu_clockev)
> - old_setup_percpu_clockev();
> -}
> -
> static void __init hv_get_partition_id(void)
> {
> struct hv_get_partition_id *output_page;
> @@ -396,16 +377,6 @@ void __init hyperv_init(void)
> wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> }
>
> - /*
> - * hyperv_init() is called before LAPIC is initialized: see
> - * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> - * apic_bsp_setup() -> setup_local_APIC(). The direct-mode STIMER
> - * depends on LAPIC, so hv_stimer_alloc() should be called from
> - * x86_init.timers.setup_percpu_clockev.
> - */
> - old_setup_percpu_clockev = x86_init.timers.setup_percpu_clockev;
> - x86_init.timers.setup_percpu_clockev = hv_stimer_setup_percpu_clockev;
> -
> hv_apic_init();
>
> x86_init.pci.arch_init = hv_pci_init;
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 6b5835a087a3..dcfbd2770d7f 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -214,6 +214,20 @@ static void __init hv_smp_prepare_boot_cpu(void)
> #endif
> }
>
> +static void (* __initdata old_setup_initr_mode)(void);
> +
> +static void __init hv_setup_initr_mode(void)
> +{
> + if (old_setup_initr_mode)
> + old_setup_initr_mode();
> +
> + /*
> + * The direct-mode STIMER depends on LAPIC and so allocate
> + * STIMER after calling initr node callback.

I'd love to see this comment have a bit more detail. I think the
point is this: "The direct-mode STIMER interrupt delivery depends
on the LAPIC being enabled". The timer mechanism itself does not
depend on the LAPIC timer. You just copied this comment from the
previous place, so making it better isn't strictly within the scope of
this patch, but if you are going to move it, let's make it a little more
precise.

> + */
> + (void)hv_stimer_alloc(false);

I understood the point that Praveen Kumar was making to be that
we should not just ignore the return value from hv_stimer_alloc()
in the case of an Isolated VM. A failure of hv_stimer_alloc() in
an Isolated VM is fatal because there is no LAPIC timer to fall
back on. In that case, really all that can be done is BUG_ON().

> +}
> +
> static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> {
> #ifdef CONFIG_X86_64
> @@ -424,6 +438,7 @@ static void __init ms_hyperv_init_platform(void)
> /* Register Hyper-V specific clocksource */
> hv_init_clocksource();
> #endif
> +
> /*
> * TSC should be marked as unstable only after Hyper-V
> * clocksource has been initialized. This ensures that the
> @@ -431,6 +446,13 @@ static void __init ms_hyperv_init_platform(void)
> */
> if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
> mark_tsc_unstable("running on Hyper-V");
> +
> + /*
> + * Override initr mode callback in order to allocate STIMER
> + * after initalizing LAPIC.
> + */
> + old_setup_initr_mode = x86_init.irqs.intr_mode_init;
> + x86_init.irqs.intr_mode_init = hv_setup_initr_mode;

Hanging the stimer initialization on the intr_mode_init() function
is an ugly hack. Is your goal to do the stimer initialization
after the interrupt mode has been setup, but before tsc_init() is
called in x86_late_time_init()? Back to my initial comments,
I'm curious as to why the existing mechanism doesn't work.

Michael

> }
>
> static bool __init ms_hyperv_x2apic_available(void)
> --
> 2.25.1