Re: [PATCH v6 1/2] ACPI / APEI: Add support to notify the vendor specific HW errors

From: Borislav Petkov
Date: Tue Mar 31 2020 - 05:21:40 EST


On Mon, Mar 30, 2020 at 03:44:29PM +0000, Shiju Jose wrote:
> 1. rasdaemon need not to print the vendor error data reported by the firmware if the
> kernel driver already print those information. In this case rasdaemon will only need to store
> the decoded vendor error data to the SQL database.

Well, there's a problem with this:

rasdaemon printing != kernel driver printing

Because printing in dmesg would need people to go grep dmesg.

Printing through rasdaemon or any userspace agent, OTOH, is a lot more
flexible wrt analyzing and collecting those error records. Especially
if you are a data center admin and you want to collect all your error
records: grepping dmesg simply doesn't scale versus all the rasdaemon
agents reporting to a centrallized location.

> 2. If the vendor kernel driver want to report extra error information through
> the vendor specific data (though presently we do not have any such use case) for the rasdamon to log.
> I think the error handled status useful to indicate that the kernel driver has filled the extra information and
> rasdaemon to decode and log them after extra data specific validity check.

The kernel driver can report that extra information without the kernel
saying that the error was handled.

So I still see no sense for the kernel to tell userspace explicitly that
it handled the error. There might be a valid reason, though, of which I
cannot think of right now.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette