Re: [PATCH 0/4] x86/Hyper-V: Panic code path fixes

From: Tianyu Lan
Date: Thu Mar 19 2020 - 22:21:34 EST


On 3/20/2020 12:07 AM, Michael Kelley wrote:
From: Michael Kelley <mikelley@xxxxxxxxxxxxx> Sent: Thursday, March 19, 2020 8:15 AM

From: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx> Sent: Thursday, March 19, 2020 7:08 AM

This patchset fixes some issues in the Hyper-V panic code path.
Patch 1 resolves issue that panic system still responses network
packets.
Patch 2-3 resolves crash enlightenment issues.
Patch 4 is to set crash_kexec_post_notifiers to true for Hyper-V
VM in order to report crash data or kmsg to host before running
kdump kernel.

I still see an issue that isn't addressed by these patches. The VMbus
driver registers a "die notifier" and a "panic notifier". But die() will
eventually call panic() if panic_on_oops is set (which I think it typically
is). If the CRASH_NOTIFY_MSG option is *not* enabled, then
hyperv_report_panic() could get called by the die notifier, and then
again by the panic notifier.

Do we even need the "die notifier"? If it was removed, there would
not be any notification to Hyper-V via the die() path unless panic_on_oops
is set, which I think is actually the correct behavior. I'm not
completely clear on what is supposed to happen in general to the
Linux kernel if panic_on_oops is not set. Does it try to continue to run?
If so, then we should not be notifying Hyper-V if panic_on_oops is not
set, and removing the die notifier is the right thing to do.


hyperv_report_panic() has re-enter check inside and so kernel only
reports crash register data once during die().

Ah, yes, you are right.

From comment in the
hyperv_report_panic(), register value reported in die chain is more
exact than value in panic chain. The register value in die chain is
passed by die() caller. Register value reported in panic chain
is collected in the hyperv_panic_event().

If panic_on_oops is not set, the task should be killed and kernel
still runs. In this case, we may not trigger crash enlightenment.

I'm not completely clear on your last statement. It seems like there
is still a problem in that die() will call hyperv_report_panic() even if
panic_on_oops is not set. We will have reported a panic to Hyper-V
even though the VM did not stop running.

Yes, the die callback is still necessary and we should skip report
if panic_on_oops isn't set.


There's one more issue to consider. hv_kmsg_dump() skips calling
hyperv_report_panic_msg() if sysctl_record_panic_msg has been cleared
by a sysctl command. (This sysctl option gives a customer the ability to
increase privacy by not having the VM's dmesg contents sent to Hyper-V.)
In this case, the earlier hyperv_report_panic() call should be used. Otherwise,
there would not be any notification to Hyper-V about the panic.


Nice catch. I will fix this in the next version.

Thanks.