Re: [boot crash] Re: [tip:x86/mce3] x86, mce: use 64bit machinecheck code on 32bit

From: Ingo Molnar
Date: Wed Sep 23 2009 - 12:19:19 EST



* Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:

> Ingo Molnar wrote:
>
>> Your sloppiness of not fixing mce_rdmsrl() as i requested brought us
>> this new boot crash regression in 2.6.31, in mce_rdmsrl():
>
> Ingo, that's because the MSRs already have capability bits. If the
> capability bits don't work we have to find out why, not hack around
> without understanding it it by using rdmsrl_safe(). Most likely
> something more is wrong then and it has to be fixed properly.

It is _entirely_ irrelevant whether, according to your opinion, this
code 'should not crash' because there's MCE capability bits declaring
that those MSRs should work.

Fact of life is that naked MSR reads are *dangerous*, _especially_ in
those cases where we use a piece of functionality on a wide category of
x86 CPUs - like in this case. They result in needless crashes when we
have much better options, such as to print warning messages. We have
rdmsrl_safe() for a reason and we use it in a number of critical places.

This is a very simple concept and you simply messed up on multiple
levels here and fail to even admit to that. I even warned you about that
very function and you ignored that. Anyway, your opinion doesnt matter
much here, i fixed this misfeature of the MCE code already. Now we
should get a nice warn-once boot warning (that can be picked up by
kerneloops.org, etc.) instead of a nasty boot crash.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/