[RFC PATCH 0/2] ACPI: APEI: handle synchronous exceptions in task work

From: Shuai Xue
Date: Tue Dec 06 2022 - 10:34:09 EST


Currently, both synchronous and asynchronous error are queued and handled by a
dedicated kthread in workqueue. Memory failure for synchronous error is
synced by a trick.

Although the task could be killed by page fault, the memory failure is handled
in a kthread context so that the hwpoison-aware mechanisms, e.g. PF_MCE_EARLY,
early kill, does not work as expected.

To this end, separate synchronous and asynchronous error handling into
different paths like X86 does:

- task work for synchronous error.
- and workqueue for asynchronous error.

This patch set is based on a new UEFI proposal submitted by our colleague Yingwen.[1]

> Background:
>
> In ARM world, two type events (Sync/Async) from hardware IP need OS/VMM take different actions.
> Current CPER memory error record is not able to distinguish sync/async type event right now.
> Current OS/VMM need to take extra actions beyond CPER which is heavy burden to identify the
> two type events
>
> Sync event (e.g. CPU consume poisoned data) --> Firmware -> CPER error log --> OS/VMM take recovery action.
> Async event (e.g. Memory controller detect UE event) --> Firmware --> CPER error log --> OS take page action.
>
>
> Proposal:
>
> - In section description Flags field(UEFI spec section N.2, add sync flag as below. OS/VMM
> could depend on this flag to distinguish sync/async events.
> - Bit8 – sync flag; if set this flag indicates that this event record is synchronous(e.g.
> cpu core consumes poison data, then cause instruction/data abort); if not set, this event record is asynchronous.
>
> Best regards,
> Yingwen Chen
>
> [ Shuai Xue: The thread is only opened to the member of UEFI Workgroup.
> Paste here for discussion.]

[1] https://members.uefi.org/wg/uswg/mail/thread/9453

Shuai Xue (2):
ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on
synchronous events
ACPI: APEI: separate synchronous error handling into task work

drivers/acpi/apei/ghes.c | 120 ++++++++++++++++++++++-----------------
include/linux/cper.h | 22 +++++++
2 files changed, 89 insertions(+), 53 deletions(-)

--
2.20.1.12.g72788fdb