Re: [PATCH 2/7] EDAC/mce_amd: Remove SMCA Extended Error code descriptions

From: M K, Muralidhara
Date: Thu Jul 20 2023 - 11:25:43 EST


Hi Boris,

On 7/20/2023 7:29 PM, Borislav Petkov wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Thu, Jul 20, 2023 at 12:54:20PM +0000, Muralidhara M K wrote:
From: Muralidhara M K <muralidhara.mk@xxxxxxx>

On AMD systems with Scalable MCA, each machine check error of a SMCA bank
type has an associated bit position in the bank's control (CTL) register.

An error's bit position in the CTL register is used during error decoding
for offsetting into the corresponding bank's error description structure.
As new errors are being added in newer AMD systems for existing SMCA bank
types, the underlying SMCA architecture guarantees that the bit positions
of existing errors are not altered.

However, on some AMD systems some of the existing bit definitions in the
CTL register of SMCA bank type are reassigned without defining new HWID
and McaType. Consequently, the errors whose bit definitions have been
reassigned in the CTL register are being erroneously decoded.

Remove SMCA Extended Error Code descriptions. This avoids decoding issues
for incorrectly reassigned bits, and avoids the related maintenance burden
in the kernel. This decoding can be done in external tools or by referring
to AMD documentation. The bank type and Extended Error Code value for an
error will continue to be printed as a convenience.

Signed-off-by: Muralidhara M K <muralidhara.mk@xxxxxxx>
Reviewed-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
---
drivers/edac/mce_amd.c | 480 -----------------------------------------
1 file changed, 480 deletions(-)

This needs to stay until rasdaemon has support for decoding errors - and
I've told you already.

Lemme tell you again, maybe it'll stick this time.

In any case, NAK.


Pull request created in rasdaemon for the same.
https://github.com/mchehab/rasdaemon/pull/106/commits/09026653864305b7a91dcb3604b91a9c0c0d74f3

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette