Re: [PATCH v2 2/3] RAS/AMD/FMPM: Save SPA values

From: Borislav Petkov
Date: Fri Mar 01 2024 - 10:51:14 EST


On Fri, Mar 01, 2024 at 08:37:47AM -0600, Yazen Ghannam wrote:
> The system physical address (SPA) of an error is not a stable value. It
> will change depending on the location of the memory: parts can be
> swapped. And it will change depending on memory topology: NUMA nodes
> and/or interleaving can be adjusted.
>
> Therefore, the SPA value is not part of the "FRU Memory Poison" record
> format. And it will not be saved to persistent storage.
>
> However, the SPA values can be helpful during debug and for system
> admins during run time.
>
> Save the SPA values in a separate structure. This is updated when
> records are restored and when new errors are saved.
>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> ---
> Link:
> https://lore.kernel.org/r/20240226152941.2615007-3-yazen.ghannam@xxxxxxx
>
> v1->v2:
> * Changed variable names to remove "sys_" prefix. (Boris)
> * Used "spa_" prefix to highlight that these are for SPA values. (Yazen)
> * Added warning to "index out-of-bound" condition. (Boris)
> * Reworked save_spa() flow to get a valid array position before saving
> SPA value (Yazen).
>
> drivers/ras/amd/fmpm.c | 68 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 68 insertions(+)

Fixups ontop:

---

diff --git a/drivers/ras/amd/fmpm.c b/drivers/ras/amd/fmpm.c
index a7bb36eb60cb..8c3188488673 100644
--- a/drivers/ras/amd/fmpm.c
+++ b/drivers/ras/amd/fmpm.c
@@ -125,7 +125,7 @@ static u64 *spa_entries;
0x12, 0x0a, 0x44, 0x58)

/**
- * DOC: fru_poison_entries (byte)
+ * DOC: max_nr_entries (byte)
* Maximum number of descriptor entries possible for each FRU.
*
* Values between '1' and '255' are valid.
@@ -285,10 +285,12 @@ static void save_spa(struct fru_rec *rec, unsigned int entry,
unsigned long spa;

if (entry >= max_nr_entries) {
- pr_warn_once("entry out-of-bounds\n");
+ pr_warn_once("FRU descriptor entry %d out-of-bounds (max: %d)\n",
+ entry, max_nr_entries);
return;
}

+ /* spa_nr_entries is always multiple of max_nr_entries */
for (i = 0; i < spa_nr_entries; i += max_nr_entries) {
fru_idx = i / max_nr_entries;
if (fru_records[fru_idx] == rec)
@@ -296,7 +298,7 @@ static void save_spa(struct fru_rec *rec, unsigned int entry,
}

if (i >= spa_nr_entries) {
- pr_warn_once("record not found");
+ pr_warn_once("FRU record %d not found\n", i);
return;
}

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette