Re: [PATCH 2/2] RAS: Introduce the FRU Memory Poison Manager

From: Yazen Ghannam
Date: Wed Feb 14 2024 - 09:45:43 EST


On 2/14/2024 5:29 AM, Borislav Petkov wrote:
On Tue, Feb 13, 2024 at 09:35:16PM -0600, Yazen Ghannam wrote:
+/* FRU Memory Poison Section, UEFI vX.Y sec N.X.Z */

Whack those:

diff --git a/drivers/ras/amd/fmpm.c b/drivers/ras/amd/fmpm.c
index 328e0a962c23..0246b13b5ba1 100644
--- a/drivers/ras/amd/fmpm.c
+++ b/drivers/ras/amd/fmpm.c
@@ -72,7 +72,7 @@
/* FRU ID Types */
#define FMP_ID_TYPE_X86_PPIN 0
-/* FRU Memory Poison Section, UEFI vX.Y sec N.X.Z */
+/* FRU Memory Poison Section */
struct cper_sec_fru_mem_poison {
u32 checksum;
u64 validation_bits;
@@ -89,7 +89,7 @@ struct cper_sec_fru_mem_poison {
/* FRU Descriptor Address Types */
#define FPD_ADDR_TYPE_MCA_ADDR 0
-/* Memory Poison Descriptor, UEFI vX.Y sec N.X.Y */
+/* Memory Poison Descriptor */
struct cper_fru_poison_desc {
u64 timestamp;
u32 hw_id_type;



Ack.
+/**
+ * DOC: fru_poison_entries (byte)
+ * Maximum number of descriptor entries possible for each FRU.
+ *
+ * Values between '1' and '255' are valid.
+ * No input or '0' will default to FMPM_DEFAULT_MAX_NR_ENTRIES.
+ */
+static u8 max_nr_entries;
+module_param(max_nr_entries, byte, 0644);
+MODULE_PARM_DESC(max_nr_entries,
+ "Maximum number of memory poison descriptor entries per FRU");

Why is there a module parameter?


I didn't think too much on this one. I kept the idea from the old set.

Murali, Naveen, any comments?
So that people can brick their BIOSes if it can't handle some size?


The ERST operations should fail and return an error status if there's not enough
space.
Can we read out the max size of the area destined for FRU records from
somewhere and go with it?


This idea was done in the old set. But it's not correct, IMO.

The 'size' we can see from ERST isn't necessarily the available storage size.
For the !NVRAM cases, it's the size of the temporary bounce buffer that BIOS
can use to pass records to and from the OS.

So it would be big enough to old the largest record BIOS expects. It doesn't
have to match the record size used in this module. ERST is used by other code
with different records.
+#define FMPM_DEFAULT_MAX_NR_ENTRIES 8
+
+/* Maximum number of FRUs in the system. */
+static unsigned int max_nr_fru;
+
+/* Total length of record including headers and list of descriptor entries. */
+static size_t max_rec_len;
+
+/*
+ * Protect the local cache and prevent concurrent writes to storage.

"local cache"?


Yes, we keep a local copy of the records within the module. That way we just need
to update the local copy and write it down to the platform. This saves time and
avoids interrupting the platform to do an extra read.

Thanks,
Yazen