Re: [PATCH v4 1/3] RAS: Introduce AMD Address Translation Library

From: Yazen Ghannam
Date: Mon Dec 18 2023 - 16:53:41 EST


On 12/18/2023 2:21 PM, Christophe JAILLET wrote:
Le 18/12/2023 à 20:04, Yazen Ghannam a écrit :
AMD Zen-based systems report memory errors through Machine Check banks
representing Unified Memory Controllers (UMCs). The address value
reported for DRAM ECC errors is a "normalized address" that is relative
to the UMC. This normalized address must be converted to a system
physical address to be usable by the OS.

Support for this address translation was introduced to the MCA subsystem
with Zen1 systems. The code was later moved to the AMD64 EDAC module,
since this was the only user of the code at the time.

However, there are uses for this translation outside of EDAC. The system
physical address can be used in MCA for preemptive page offlining as done
in some MCA notifier functions. Also, this translation is needed as the
basis of similar functionality needed for some CXL configurations on AMD
systems.

Introduce a common address translation library that can be used for
multiple subsystems including MCA, EDAC, and CXL.

Include support for UMC normalized to system physical address
translation for current CPU systems.

The Data Fabric Indirect register access offsets and one of the register
fields were changed. Default to the current offsets and register field
definition. And fallback to the older values if running on a "legacy"
system.

Provide built-in code to facilitate the loading and unloading of the
library module without affecting other modules or built-in code.

Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
---

...

+int get_address_map(struct addr_ctx *ctx)
+{
+    int ret = 0;

Nit: unneeded init

+
+    ret = get_address_map_common(ctx);
+    if (ret)
+        goto out;
+
+    ret = get_global_map_data(ctx);
+    if (ret)
+        goto out;
+
+    dump_address_map(&ctx->map);
+
+out:
+    return ret;
+}
diff --git a/drivers/ras/amd/atl/reg_fields.h b/drivers/ras/amd/atl/reg_fields.h
new file mode 100644
index 000000000000..6aaa5093f42c
--- /dev/null
+++ b/drivers/ras/amd/atl/reg_fields.h
@@ -0,0 +1,603 @@

...

+static void get_num_maps(void)
+{
+    switch (df_cfg.rev) {
+    case DF2:
+    case DF3:
+    case DF3p5:
+        df_cfg.num_coh_st_maps    = 2;
+        break;
+    case DF4:
+        df_cfg.num_coh_st_maps    = 4;
+        break;

If 4 is the correct value in both cases, DF4 and DF4p5 cases could be merged.

CJ

+    case DF4p5:
+        df_cfg.num_coh_st_maps    = 4;
+        break;
+    default:
+        atl_debug_on_bad_df_rev();
+    }
+}

...


Yep, good points. Thanks for your feedback!

-Yazen