Re: [RFC] x86, NMI, Treat unknown NMI as hardware error

From: huang ying
Date: Sat May 14 2011 - 20:06:40 EST


On Sat, May 14, 2011 at 3:51 PM, Cyrill Gorcunov <gorcunov@xxxxxxxxx> wrote:
> On 05/14/2011 04:26 AM, huang ying wrote:
>> On Fri, May 13, 2011 at 11:17 PM, Cyrill Gorcunov <gorcunov@xxxxxxxxx> wrote:
>>> Hi Ying,
>>>
>>> just curious (regardless the concerns Don and Ingo have) -- if there still a need
>>> for such semi-unknown nmi handling maybe it's worth to register a *notifier* for it
>>> and we panic only when user *explicitly* specify how to treat this class of NMIs
>>> (via say "hest-nmi-panic" boot option or something like that). Maybe such partially
>>> modular scheme would be better? If only I don't miss anything.
>>
>> Hi, Cyrill,
>>
>> IMHO, Pushing all policy to user is not good too. ÂHow many users
>> understand unknown NMI and hardware error clearly? ÂIt is better if we
>> can determine what is the right behavior.
>>
>
> yes, is not good. But at least we *must* provide a way to turn this new feature off
> via command line I think. One of a reason for me is perf unknown nmis (at moment we seems
> to have captured and cured all parasite NMIs sources but there is no guarantee we wont
> meet them in future due to some code change or whatever). And bloating trap.c with
> new if()'s is not that good I guess, that is why I asked if there a way to do all the
> work via notifiers ;)

Yes. We should consider about perf unknown NMI issues. But compared
with pushing all magic to user, I think the better way is to have a
better default behavior in kernel. For example, we can turn off
unknown NMI as hwerr logic temporarily if there are more than 1 perf
NMI events in action. Is that reasonable?

And, I am not a big fan of notifiers, that makes code hard to be
understood. If you have concerns about the size of traps.c, we can
move all NMI logic to a new file.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/