Re: [PATCH v6 1/2] ACPI / APEI: Add support to notify the vendor specific HW errors

From: Borislav Petkov
Date: Fri Mar 27 2020 - 14:22:23 EST


On Wed, Mar 25, 2020 at 04:42:22PM +0000, Shiju Jose wrote:
> Presently APEI does not support reporting the vendor specific
> HW errors, received in the vendor defined table entries, to the
> vendor drivers for any recovery.
>
> This patch adds the support to register and unregister the

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

> error handling function for the vendor specific HW errors and
> notify the registered kernel driver.
>
> Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx>
> ---
> drivers/acpi/apei/ghes.c | 35 ++++++++++++++++++++++++++++++++++-
> drivers/ras/ras.c | 5 +++--
> include/acpi/ghes.h | 28 ++++++++++++++++++++++++++++
> include/linux/ras.h | 6 ++++--
> include/ras/ras_event.h | 7 +++++--
> 5 files changed, 74 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 24c9642e8fc7..d83f0b1aad0d 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -490,6 +490,32 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
> #endif
> }
>
> +static ATOMIC_NOTIFIER_HEAD(ghes_event_notify_list);
> +
> +/**
> + * ghes_register_event_notifier - register an event notifier
> + * for the non-fatal HW errors.
> + * @nb: pointer to the notifier_block structure of the event handler.
> + *
> + * return 0 : SUCCESS, non-zero : FAIL
> + */
> +int ghes_register_event_notifier(struct notifier_block *nb)
> +{
> + return atomic_notifier_chain_register(&ghes_event_notify_list, nb);
> +}
> +EXPORT_SYMBOL_GPL(ghes_register_event_notifier);
> +
> +/**
> + * ghes_unregister_event_notifier - unregister the previously
> + * registered event notifier.
> + * @nb: pointer to the notifier_block structure of the event handler.
> + */
> +void ghes_unregister_event_notifier(struct notifier_block *nb)
> +{
> + atomic_notifier_chain_unregister(&ghes_event_notify_list, nb);
> +}
> +EXPORT_SYMBOL_GPL(ghes_unregister_event_notifier);
> +
> static void ghes_do_proc(struct ghes *ghes,
> const struct acpi_hest_generic_status *estatus)
> {
> @@ -526,10 +552,17 @@ static void ghes_do_proc(struct ghes *ghes,
> log_arm_hw_error(err);
> } else {
> void *err = acpi_hest_get_payload(gdata);
> + u8 error_handled = false;
> + int ret;
> +
> + ret = atomic_notifier_call_chain(&ghes_event_notify_list, 0, gdata);

Well, this is a notifier with standard name for a non-standard event.
Not optimal.

Why does only this event need a notifier? Because your driver is
interested in only those events?

> + if (ret & NOTIFY_OK)
> + error_handled = true;
>
> log_non_standard_event(sec_type, fru_id, fru_text,
> sec_sev, err,
> - gdata->error_data_length);
> + gdata->error_data_length,
> + error_handled);

What's that error_handled thing for? That's just silly.

Your notifier returns NOTIFY_STOP when it has queued the error. If you
don't want to log it, just test == NOTIFY_STOP and do not log it then.

Then your notifier callback is queuing the error into a kfifo for
whatever reason and then scheduling a workqueue to handle it in user
context...

So I'm thinking that it would be better if you:

* make that kfifo generic and part of ghes.c and queue all types of
error records into it in ghes_do_proc() - not just the non-standard
ones.

* then, when you're done queuing, you kick a workqueue.

* that workqueue runs a normal, blocking notifier to which drivers
register.

Your driver can register to that notifier too and do the normal handling
then and not have this ad-hoc, semi-generic, semi-vendor-specific thing.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette