RE: [RFC][PATCH v4 -next 1/4] Move kmsg_dump(KMSG_DUMP_PANIC) belowsmp_send_stop()

From: Seiji Aguchi
Date: Fri Jan 13 2012 - 17:51:57 EST


Tony,

I understand you are seriously concerned about reliability of pstore.
And I'm in the same position.

But I still suggest to move kmsg_dump() below smp_send_stop().

>The 20% of me that isn't buying this
>still has worries that smp_send_stop() might fail in one of several ways:
>1) Fails to actually stop one or more other cpus (this is similar to our
>current situation where other cpus may interfere with us saving kmsg in
>pstore).

I don't understand this case. Have you ever experienced some specific cases
failing to stop cpus?

As for x86, this case will never happen if not cpus are broken.

>2) Causes another fault, thus recursively entering the panic path.
>3) Hangs - causing us to miss saving to pstore.

These concerns are not just smp_send_stop().

In panic(), there are some function calls above kmsg_dump().
Ex. dump_stack(), printk(), crash_kexec()....
If they panic/hang, same issues will happen.

So, 2) and 3) are not reasonable reasons for rejecting to move kmsg_dump() below smp_send_stop().

>
>I don't know what can be done to resolve this - it is hard to make a
>100% convincing argument about the execution of any code in the panic
>path.

One of the ways we have confidence is doing more testing.
As for kdump, LKDTM is used for checking regressions of kdump.

If pstore works with LKDTM, we can prove that pstore has minimal reliabliy.
(I don't know if we need additional testing at this time.)

Seiji

N‹§²æìr¸›yúèšØb²X¬¶ÇvØ^–)Þ{.nÇ+‰·¥Š{±‘êçzX§¶›¡Ü}©ž²ÆzÚ&j:+v‰¨¾«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹®w¥¢¸?™¨è­Ú&¢)ßf”ù^jÇy§m…á@A«a¶Úÿ 0¶ìh®å’i