Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

From: Borislav Petkov
Date: Wed Feb 14 2024 - 12:52:09 EST


On Wed, Feb 14, 2024 at 09:28:54AM -0500, Yazen Ghannam wrote:
> > That's a good thing to have here.

Up to here. __packed still needs clarification.

diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
index 782951aa302f..f5dde88a3188 100644
--- a/drivers/ras/Kconfig
+++ b/drivers/ras/Kconfig
@@ -37,14 +37,13 @@ source "drivers/ras/amd/atl/Kconfig"
config RAS_FMPM
tristate "FRU Memory Poison Manager"
default m
- depends on X86_MCE
- imply AMD_ATL
+ depends on AMD_ATL
help
Support saving and restoring memory error information across reboot
- cycles using ACPI ERST as persistent storage. Error information is
- saved with the UEFI CPER "FRU Memory Poison" section format.
+ using ACPI ERST as persistent storage. Error information is saved with
+ the UEFI CPER "FRU Memory Poison" section format.

- Memory may be retired during boot time and run time depending on
+ Memory will be retired during boot time and run time depending on
platform-specific policies.

endif
diff --git a/drivers/ras/amd/fmpm.c b/drivers/ras/amd/fmpm.c
index d6a963aca093..901a1f0018fc 100644
--- a/drivers/ras/amd/fmpm.c
+++ b/drivers/ras/amd/fmpm.c
@@ -12,7 +12,7 @@
*
* Implementation notes, assumptions, and limitations:
*
- * - FRU Memory Poison Section and Memory Poison Descriptor definitions are not yet
+ * - FRU memory poison section and memory poison descriptor definitions are not yet
* included in the UEFI specification. So they are defined here. Afterwards, they
* may be moved to linux/cper.h, if appropriate.
*
@@ -23,16 +23,13 @@
* AMD MI300-based platform(s) assumptions:
* - Memory errors are reported through x86 MCA.
* - The entire DRAM row containing a memory error should be retired.
- * - There will be (1) FRU Memory Poison Section per CPER.
- * - The FRU will be the CPU Package (Processor Socket).
- * - The default number of Memory Poison Descriptor entries should be (8).
- * - The Platform will use ACPI ERST for persistent storage.
+ * - There will be (1) FRU memory poison section per CPER.
+ * - The FRU will be the CPU package (processor socket).
+ * - The default number of memory poison descriptor entries should be (8).
+ * - The platform will use ACPI ERST for persistent storage.
* - All FRU records should be saved to persistent storage. Module init will
* fail if any FRU record is not successfully written.
*
- * - Source code will be under 'drivers/ras/amd/' unless and until there is interest
- * to use this module for other vendors.
- *
* - Boot time memory retirement may occur later than ideal due to dependencies
* on other libraries and drivers. This leaves a gap where bad memory may be
* accessed during early boot stages.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette