Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error traceevent

From: Naveen N. Rao
Date: Tue Aug 13 2013 - 13:18:05 EST


On 08/13/2013 06:11 PM, Mauro Carvalho Chehab wrote:
Em Tue, 13 Aug 2013 17:11:18 +0530
"Naveen N. Rao" <naveen.n.rao@xxxxxxxxxxxxxxxxxx> escreveu:

On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote:
But, this only seems to expose the APEI data as a string
and doesn't look to really make all the fields available to user-space
in a raw manner. Not sure how well this can be utilised by a user-space
tool. Do you have suggestions on how we can do this?

There's already an userspace tool that handes it:
https://git.fedorahosted.org/cgit/rasdaemon.git/

What is missing there on the current version is the bits that would allow
to translate from APEI way to report an error (memory node, card, module,
bank, device) into a DIMM label[1].

If I'm reading this right, all APEI data seems to be squashed into a
string in mc_event.

Yes. We had lots of discussion about how to map memory errors over the
last couple years. Basically, it was decided that the information that
could be decoded into a DIMM to be mapped as integers, and all other
driver-specific data to be added as strings.

On the tests I did, different machines/vendors fill the APEI data on
a different way, with makes harder to associate them to a DIMM.

Ok, so it looks like ghes_edac isn't quite useful yet.

In the meantime, like Boris suggests, I think we can have a different trace event for raw APEI reports - userspace can use it as it pleases.

Once ghes_edac gets better, users can decide whether they want raw APEI reports or the EDAC-processed version and choose one or the other trace event.

Regards,
Naveen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/