Re: [PATCH -tip 1/3] x86, mce: Add mce_threshold option for intelcmci

From: Andi Kleen
Date: Tue Mar 31 2009 - 04:08:07 EST


Hidetoshi Seto wrote:
Andi Kleen wrote:
To turn it off you would need to disable the CMCI enable bit
completely.

mce_threshold=0 discourages CMCI initialization.
The CMCI enable bits are kept in off states in this case.

True, I missed that earlier. Still a different option would be better.



However I expect that this will be not a good idea to ever use on Nehalem
class systems at least because without CMCI the machine check code cannot
handle shared banks correctly and you'll get duplicated events from them.
And on non Nehalem systems there is no CMCI anyways, so it'll be always
off.

One question is that even if one clears record in a shared bank, others
sharing the bank still can retrieve same record? Or the duplication of
recored only happens if a shared bank is polled by multiple cpu in parallel
at same time?

Only when multiple CPUs poll (or machine check) at the same time.


So old kernel without CMCI support running on new Nehalem class system will
make duplicated records, right?

Occasionally when it races yes.

Doesn't it impact to current distro like RHEL5?

Yes, somewhat. The bigger problem there is actually lack of broadcast handling,
that often leads to incorrect reporting of fatal MCEs.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/