Re: [PATCH V4] notifier/panic: Introduce panic_notifier_filter

From: Guilherme G. Piccoli
Date: Tue Feb 08 2022 - 13:51:54 EST


On 28/01/2022 10:38, Petr Mladek wrote:
> [...] On Thu 2022-01-27 14:16:20, Guilherme G. Piccoli wrote:
> First, I am sorry for the very long mail. But the problem is really
> complicated. I did my best to describe it a clean way.
>
> I have discussed these problems with a colleague and he had some good
> points. And my view evolved even further.

Thanks Petr for the very comprehensive and detailed email - this helps a
lot in shaping the future of panic notifier(s)!


> [...]
> I think about the following solution:
>
> + split the notifiers into three lists:
>
> + info: stop watchdogs, provide extra info
> + hypervisor: poke hypervisor
> + reboot: actions needed only when crash dump did not happen
>
> + allow to call hypervisor notifiers before or after kdump
>
> + stop CPUs before kdump when either hypervisor notifiers or
> kmsg_dump is enabled
>
> Note that it still allows to call kdump as the first action when
> hypervisor notifiers are called after kdump and no kmsg dumper
> is registered.
>
>
> void panic(void)
> {
> [...]
>
> if (crash_kexec_post_hypervisor || panic_print || enabled_kmsg_dump()) {
> /*
> * Stop CPUs when some extra action is required before
> * crash dump. We will need architecture dependent extra
> * works in addition to stopping other CPUs.
> */
> crash_smp_send_stop();
> cpus_stopped = true;
> }
>
> if (crash_kexec_post_hypervisor) {
> /* Tell hypervisor about the panic */
> atomic_notifier_call_chain(&panic_hypervisor_notifier_list, 0, buf);
> }
>
> if (enabled_kmsg_dump) {
> /*
> * Print extra info by notifiers.
> * Prevent rumors, for example, by stopping watchdogs.
> */
> atomic_notifier_call_chain(&panic_info_notifier_list, 0, buf);
> }
>
> /* Optional extra info */
> panic_printk_sys_info();
>
> /* No dumper by default */
> kmsg_dump();
>
> /* Used only when crash kernel loaded */
> __crash_kexec(NULL);
>
> if (!cpus_stopped) {
> /*
> * Note smp_send_stop is the usual smp shutdown function, which
> * unfortunately means it may not be hardened to work in a
> * panic situation.
> */
> smp_send_stop();
> }
>
> if (!crash_kexec_post_hypervisor) {
> /* Tell hypervisor about the panic */
> atomic_notifier_call_chain(&panic_hypervisor_notifier_list, 0, buf);
> }
>
> if (!enabled_kmsg_dump) {
> /*
> * Print extra info by notifiers.
> * Prevent rumors, for example, by stopping watchdogs.
> */
> atomic_notifier_call_chain(&panic_info_notifier_list, 0, buf);
> }
>
> /*
> * Help to reboot a safe way.
> */
> atomic_notifier_call_chain(&panic_reboot_notifier_list, 0, buf);
>
> [...]
> }
>
> Any opinion?
> Do the notifier list names make sense?
>

This was exposed very clearly, thanks. I agree with you, it's a good
approach, and we can evolve that during the implementation phase, like
"function A is not good in the hypervisor list because of this and
that", so we move it to the reboot list. Also, name of the lists is not
so relevant, might evolve in the implementation phase - I personally
liked them, specially the "info" and "hypervisor" ones (reboot seems
good but not great heh).

So, what are the opinions from kdump maintainers about this idea?
Baoquan / Vivek / Dave, does it make sense to you? Do you have any
suggestions/concerns to add on top of Petr draft?

I prefer this refactor than the filter, certainly. If nobody else
working on that, I can try implementing that - it's very interesting.
The only thing I'd like to have first is an ACK from the kdump
maintainers about the general idea.

Cheers,


Guilherme