[PATCH 0/2] ARM Error Source Table V1 Support

From: Ruidong Tian
Date: Mon Mar 04 2024 - 06:15:37 EST


This series adds support for the ARM Error Source Table (AEST) based on
the 1.1 version of ACPI for the Armv8 RAS Extensions [0].

The Arm Error Source Table (AEST) enable kernel-first handling of errors
in a system that supports the Armv8 RAS extensions. Hardware errors will
trigger a RAS interrupt to kernel, kernel scan all AEST node to fine
error node which occur error in irq context and use a workqueue to log
this hardware errors.

I have tested this series on PTG Yitian710 SOC. Both corrected and
uncorrected errors were tested to verify the non-fatal vs fatal
scenarios.

Future work:
1. UE trigger memory_failure other than panic.
2. Add CE storm mitigation.
3. Support AEST V2.

This series is based on Tyler Baicar's patches [1], which do not have v2
sended to mail list yet. Change from origin patch:
1. Add a genpool to collect all AEST error, and log them in a workqueue
other than in irq context.
2. Just use the same one aest_proc function for system register interface
and MMIO interface.
3. Reconstruct some structures and functions to make it more clear.
4. Accept all comments in Tyler Baicar's mail list.

[0]: https://developer.arm.com/documentation/den0085/0101/
[1]: https://lore.kernel.org/all/20211124170708.3874-1-baicar@xxxxxxxxxxxxxxxxxxxxxx/

Tyler Baicar (2):
ACPI/AEST: Initial AEST driver
trace, ras: add ARM RAS extension trace event

MAINTAINERS | 11 +
arch/arm64/include/asm/ras.h | 38 ++
drivers/acpi/arm64/Kconfig | 10 +
drivers/acpi/arm64/Makefile | 1 +
drivers/acpi/arm64/aest.c | 728 +++++++++++++++++++++++++++++++++++
include/linux/acpi_aest.h | 91 +++++
include/linux/cpuhotplug.h | 1 +
include/ras/ras_event.h | 55 +++
8 files changed, 935 insertions(+)
create mode 100644 arch/arm64/include/asm/ras.h
create mode 100644 drivers/acpi/arm64/aest.c
create mode 100644 include/linux/acpi_aest.h

--
2.33.1