RE: [PATCH 1/1] x86/hyperv: Initialize clockevents earlier in CPU onlining

From: Michael Kelley
Date: Wed Oct 30 2019 - 16:51:07 EST


From: Dexuan Cui <decui@xxxxxxxxxxxxx> Sent: Tuesday, October 29, 2019 1:34 PM
>
> Should we add the 2 lines:
>
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: fd1fea6834d0 ("clocksource/drivers: Make Hyper-V clocksource ISA agnostic")
>
> fd1fea6834d0() removes the clockevents_unbind_device() call and this patch adds it back.

I thought about this, but the changes in this patch seem more extensive than
we might want to take as a stable fix. The previous removal of
clockevents_unbind_device() does not have any negative effects except on
the in-progress hibernation work. We really want to have this patch
go with the hibernation patches, so backporting these changes to stable
isn't necessary for hibernation support. But whether to backport to
stable is a judgment call, and I'm open to arguments in favor.

>
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -323,8 +323,15 @@ void __init hyperv_init(void)
> >
> > x86_init.pci.arch_init = hv_pci_init;
> >
> > + if (hv_stimer_alloc())
> > + goto remove_hypercall_page;
> > +
>
> The error handling is imperfect here: when I force hv_stimer_alloc() to return
> -ENOMEM, I get a panic in hv_apic_eoi_write(). It looks it is because we have cleared
> the pointer 'hv_vp_assist_page' to NULL, but hv_apic_eoi_write() is still in-use.

This can be fixed by doing hv_stimer_alloc() before the call to hv_apic_init().
>
> In case hv_stimer_alloc() fails, can we set 'direct_mode_enabled' to false
> and go on with the legacy Hyper-V timer or LAPIC timer? If not, maybe
> we can use a BUG_ON() to explicitly panic?

I'll look into what can be done here.

>
> > return;
> >
> > +remove_hypercall_page:
> > + hypercall_msr.as_uint64 = 0;
> > + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> > + hv_hypercall_pg = NULL;
> > remove_cpuhp_state:
> > cpuhp_remove_state(cpuhp);
> > free_vp_assist_page:
>
> > -void hv_stimer_cleanup(unsigned int cpu)
> > +static int hv_stimer_cleanup(unsigned int cpu)
> > {
> > struct clock_event_device *ce;
> >
> > /* Turn off clockevent device */
> > if (ms_hyperv.features & HV_MSR_SYNTIMER_AVAILABLE) {
> > ce = per_cpu_ptr(hv_clock_event, cpu);
> > +
> > + /*
> > + * In the legacy case where Direct Mode is not enabled
> > + * (which can only be on x86/64), stimer cleanup happens
> > + * relatively early in the CPU offlining process. We
> > + * must unbind the stimer-based clockevent device so
> > + * that the LAPIC timer can take over until clockevents
> > + * are no longer needed in the offlining process. The
> > + * unbind should not be done when Direct Mode is enabled
> > + * because we may be on an architecture where there are
> > + * no other clockevents devices to fallback to.
> > + */
> > + if (!direct_mode_enabled)
> > + clockevents_unbind_device(ce, cpu);
> > hv_ce_shutdown(ce);
>
> In the legacy stimer0 mode, IMO this hv_ce_shutdown() is unnecessary,
> because "clockevents_unbind_device(ce, cpu)" automatically calls
> ce->set_state_shutdown(), if ce is active:
> clockevents_unbind
> __clockevents_unbind
> clockevents_replace
> tick_install_replacement
> clockevents_exchange_device
> clockevents_switch_state(old, CLOCK_EVT_STATE_DETACHED)
> __clockevents_switch_state

Agreed. This is a carryover from even before reorganizing the code
into the separate Hyper-V clocksource driver, where this code was
present in hv_synic_cleanup():

/* Turn off clockevent device */
if (ms_hyperv.features & HV_MSR_SYNTIMER_AVAILABLE) {
struct hv_per_cpu_context *hv_cpu
= this_cpu_ptr(hv_context.cpu_context);

clockevents_unbind_device(hv_cpu->clk_evt, cpu);
hv_ce_shutdown(hv_cpu->clk_evt);
}

But I'll fix it.

>
> And, in both modes (legacy mode and direct mode), it looks incorrect to
> call hv_ce_shutdown() if the current processid id != 'cpu', because
> hv_ce_shutdown() -> hv_init_timer() can only access the current CPU's
> MSR. Maybe we should use an IPI to run hv_ce_shutdown() on the target
> CPU in direct mode?

Agreed. I'll figure out a fix.

>
> > -int hv_stimer_alloc(int sint)
> > +int hv_stimer_alloc(void)
> > ...
> > + ret = cpuhp_setup_state(CPUHP_AP_HYPERV_TIMER_STARTING,
> > + "clockevents/hyperv/stimer:starting",
> > + hv_stimer_init, hv_stimer_cleanup);
> > + if (ret < 0)
> > + goto free_stimer0_irq;
> > + stimer0_cpuhp = ret;
> > }
> > + return ret;
>
> stimer0_cpuhp is 0 when the call is successful, so IMO the logic in
> hv_stimer_free() is incorrect. Please see below.
>
> > void hv_stimer_free(void)
> > {
> > - if (direct_mode_enabled && (stimer0_irq != 0)) {
> > - hv_remove_stimer0_irq(stimer0_irq);
> > - stimer0_irq = 0;
> > + if (direct_mode_enabled) {
> > + if (stimer0_cpuhp) {
> > + cpuhp_remove_state(stimer0_cpuhp);
> > + stimer0_cpuhp = 0;
> > + }
> > + if (stimer0_irq) {
> > + hv_remove_stimer0_irq(stimer0_irq);
> > + stimer0_irq = 0;
> > + }
> > }
>
> IMO this should be
> if (direct_mode_enabled) {
> if (stimer0_cpuhp == 0)
> cpuhp_remove_state(CPUHP_AP_HYPERV_TIMER_STARTING);
>
> if (stimer0_irq) {
> hv_remove_stimer0_irq(stimer0_irq);
> stimer0_irq = 0;
> }
> }
>
> BTW, the default value of 'stimer0_cpuhp' is 0, which means success.
> Should we change the default value to a non-zero value, e.g. -1 ?

Indeed, you are correct. I was treating the return value from
cphup_setup_state() like the case when the state is
CPUHP_AP_ONLINE_DYN. I'll fix it.

>
> > free_percpu(hv_clock_event);
> > hv_clock_event = NULL;
> > @@ -190,14 +274,11 @@ void hv_stimer_free(void)
> > void hv_stimer_global_cleanup(void)
> > {
> > int cpu;
> > - struct clock_event_device *ce;
> >
> > - if (ms_hyperv.features & HV_MSR_SYNTIMER_AVAILABLE) {
> > - for_each_present_cpu(cpu) {
> > - ce = per_cpu_ptr(hv_clock_event, cpu);
> > - clockevents_unbind_device(ce, cpu);
> > - }
> > + for_each_present_cpu(cpu) {
> > + hv_stimer_cleanup(cpu);
>
> hv_stimer_cleanup() -> hv_ce_shutdown() -> hv_init_timer() can not
> access a remote CPU's MSR.

Agreed, per above.

>
> > @@ -2310,7 +2305,6 @@ static void hv_crash_handler(struct pt_regs *regs)
> > */
> > vmbus_connection.conn_state = DISCONNECTED;
> > cpu = smp_processor_id();
> > - hv_stimer_cleanup(cpu);
> > hv_synic_cleanup(cpu);
> > hyperv_cleanup();
>
> Why should we remove the line hv_stimer_cleanup() in the crash handler?
> In the crash handler, IMO we'd better disable the timer before we proceed.

With the stimer in legacy mode, the cleanup should happen in
hv_synic_cleanup(). But that actually highlights a previously unrecognized
problem in that hv_synic_cleanup() usually doesn't do anything because it
returns -EBUSY if there is a channel interrupt assigned to "cpu". So even
in the older code before the clock/timer reorg, none of the timers or
other synic functionality actually got cleaned up.

With the stimer in DirectMode, the cleanup should probably happen in
hyperv_cleanup(), to mirror the initialization in hyperv_init(). I'll make
that change. Fixing the legacy path is probably a separate patch
because we'll need fix the synic cleanup problem by distinguishing
between a normal cleanup due to CPU offlining or hibernation, and the
crash path.

Thanks for the careful review!

Michael