Re: [PATCH] mce: fix missing stack-dumping in mce_panic()

From: Miaohe Lin
Date: Thu Jan 05 2023 - 20:58:07 EST


On 2023/1/4 5:12, Luck, Tony wrote:
>>> I guess the original issue that commit was fixing is to save that
>>> redundant oops message but Tony seems to want to see it now and I'm not
>>> sure how much we care about 80x50 screens nowadays... :-)
>

Many thanks for your thought. :)

> I want a stack dump for the specific case of a recoverable machine check caused by
> poison consumption in kernel code that doesn't have an extable[] entry for a recovery
> path. That's a potential candidate for future kernel change to make that recoverable

Sure, in this case a stack dump will be really helpful. We can gather these stack dumps
and try to make the most frequent machine check scene recoverable.

> (if the code path seems common enough to warrant the churn), and there is some
> plausible way for s/w to "recover").
>
> For most other machine checks the dump is very likely useless. E.g. some CPU core stalled
> so that the system generates a broadcast machine check because instructions are not
> being retired on that CPU core. In this case the machine check "monarch" is almost certainly
> some innocent bystander that was executing normally. Stack dump from that CPU is going
> to tell you nothing about the machine check.

Agree. A stack dump won't be helpful in this case. But I tend to keep the stack dump in case
it would be helpful and also make mce_panic() dumps the stack as expected. What do you think?

Thanks,
Miaohe Lin