[PATCH 0/2] FRU Memory Poison Manager

From: Yazen Ghannam
Date: Tue Feb 13 2024 - 22:35:47 EST


Hi all,

This set adds a new module to manage error records on persistent
storage.

Patch 1 moves a function from AMD64 EDAC to the AMD Address Translation
Library. This is needed for patch 2.

Patch 2 adds the new module. This is a near total rewrite based on patch
2 from the following set:
https://lore.kernel.org/r/20231129075034.2159223-1-muralimk@xxxxxxx

I included questions in code comments where I think more attention is
needed.

I'd like to add Murali and Naveen as Co-developers, since this is based
on their work. Also, I kept Naveen as a maintainer in case he's still
interested.

Regarding the old set:
* Patch 1 exports a new function from the ERST driver. This is not
necessary.

* Patch 3 adds a new sysfs interface. This needs more work.

* Patch 4 old set adds documentation. This needs updating.

I did some basic testing on a 2P server system without ERST support.
Mostly I tried to check out the memory layout of the structures. And I
did some memory error injections to check out the record updating flow.
I did some fixups after testing, so I apologize if I missed anything.

Thanks,
Yazen

Yazen Ghannam (2):
RAS/AMD/ATL, EDAC/amd64: Move MI300 Row Retirement to ATL
RAS: Introduce the FRU Memory Poison Manager

MAINTAINERS | 7 +
drivers/edac/Kconfig | 1 -
drivers/edac/amd64_edac.c | 48 ---
drivers/ras/Kconfig | 13 +
drivers/ras/Makefile | 1 +
drivers/ras/amd/atl/Kconfig | 1 +
drivers/ras/amd/atl/umc.c | 51 +++
drivers/ras/amd/fmpm.c | 776 ++++++++++++++++++++++++++++++++++++
include/linux/ras.h | 2 +
9 files changed, 851 insertions(+), 49 deletions(-)
create mode 100644 drivers/ras/amd/fmpm.c


base-commit: c2064388aa8765abd7c2c5785e7bfe266a2f6cd3
--
2.34.1