[GIT PULL] EDAC updates for v6.9

From: Borislav Petkov
Date: Mon Mar 11 2024 - 11:57:43 EST


Hi Linus,

please pull EDAC updates for 6.9.

Due to the topology changes from tip, a oneliner is needed to be applied
as part of the merge commit:

diff --git a/drivers/ras/amd/atl/umc.c b/drivers/ras/amd/atl/umc.c
index 08c6dbd44c62..59b6169093f7 100644
--- a/drivers/ras/amd/atl/umc.c
+++ b/drivers/ras/amd/atl/umc.c
@@ -315,7 +315,7 @@ static u8 get_die_id(struct atl_err *err)
* For CPUs, this is the AMD Node ID modulo the number
* of AMD Nodes per socket.
*/
- return topology_die_id(err->cpu) % amd_get_nodes_per_socket();
+ return topology_amd_node_id(err->cpu) % topology_amd_nodes_per_pkg();
}

#define UMC_CHANNEL_NUM GENMASK(31, 20)
---

Linux-next did test with a similar diff carried on forwards:

https://lore.kernel.org/r/20240227134352.6deda860@xxxxxxxxxxxxxxxx

but we very recently realized that
s/topology_die_id/topology_amd_node_id/ needs to happen too.

That's not a big deal, though, as these are all new drivers for new
hardware which pretty much no one has yet so there's no risk of breaking
any existing machines out there.

Thx.

---

The following changes since commit 6613476e225e090cc9aad49be7fa504e290dd33d:

Linux 6.8-rc1 (2024-01-21 14:11:32 -0800)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/edac_updates_for_v6.9

for you to fetch changes up to af65545a0f82d7336f62e34f69d3c644806f5f95:

Merge remote-tracking branches 'ras/edac-drivers', 'ras/edac-misc' and 'ras/edac-amd-atl' into edac-updates-for-v6.9 (2024-03-11 16:24:20 +0100)

----------------------------------------------------------------
- Add a FRU (Field Replaceable Unit) memory poison manager which
collects and manages previously encountered hw errors in order to
save them to persistent storage across reboots. Previously recorded
errors are "replayed" upon reboot in order to poison memory which has
caused said errors in the past.

The main use case is stacked, on-chip memory which cannot simply be
replaced so poisoning faulty areas of it and thus making them
inaccessible is the only strategy to prolong its lifetime.

- Add an AMD address translation library glue which converts the
reported addresses of hw errors into system physical addresses in
order to be used by other subsystems like memory failure, for
example. Add support for MI300 accelerators to that library.

- igen6: Add support for Alder Lake-N SoC

- i10nm: Add Grand Ridge support

- The usual fixlets and cleanups

----------------------------------------------------------------
Borislav Petkov (AMD) (3):
Documentation: Move RAS section to admin-guide
RAS: Export helper to get ras_debugfs_dir
Merge remote-tracking branches 'ras/edac-drivers', 'ras/edac-misc' and 'ras/edac-amd-atl' into edac-updates-for-v6.9

Dan Carpenter (2):
RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()
RAS/AMD/FMPM: Fix off by one when unwinding on error

Lili Li (1):
EDAC/igen6: Add one more Intel Alder Lake-N SoC support

Muralidhara M K (1):
RAS/AMD/ATL: Add MI300 support

Qiuxu Zhuo (1):
EDAC/i10nm: Add Intel Grand Ridge micro-server support

Shubhrajyoti Datta (1):
EDAC/versal: Make the bit position of injected errors configurable

Uwe Kleine-König (1):
EDAC/versal: Convert to platform remove callback returning void

Yangtao Li (1):
EDAC/synopsys: Convert to devm_platform_ioremap_resource()

Yazen Ghannam (9):
RAS: Introduce AMD Address Translation Library
EDAC/amd64: Use new AMD Address Translation Library
Documentation: RAS: Add index and address translation section
RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support
RAS/AMD/ATL: Add MI300 row retirement support
RAS: Introduce a FRU memory poison manager
RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()
RAS/AMD/FMPM: Save SPA values
RAS/AMD/FMPM: Add debugfs interface to print record entries

.../admin-guide/RAS/address-translation.rst | 24 +
.../ras.rst => admin-guide/RAS/error-decoding.rst} | 11 +-
Documentation/admin-guide/RAS/index.rst | 7 +
.../admin-guide/{ras.rst => RAS/main.rst} | 10 +-
Documentation/admin-guide/index.rst | 2 +-
Documentation/index.rst | 1 -
MAINTAINERS | 15 +-
drivers/edac/Kconfig | 1 +
drivers/edac/amd64_edac.c | 286 +-----
drivers/edac/i10nm_base.c | 1 +
drivers/edac/igen6_edac.c | 2 +
drivers/edac/synopsys_edac.c | 4 +-
drivers/edac/versal_edac.c | 199 +++-
drivers/ras/Kconfig | 13 +
drivers/ras/Makefile | 3 +
drivers/ras/amd/atl/Kconfig | 21 +
drivers/ras/amd/atl/Makefile | 18 +
drivers/ras/amd/atl/access.c | 133 +++
drivers/ras/amd/atl/core.c | 225 +++++
drivers/ras/amd/atl/dehash.c | 500 ++++++++++
drivers/ras/amd/atl/denormalize.c | 718 ++++++++++++++
drivers/ras/amd/atl/internal.h | 306 ++++++
drivers/ras/amd/atl/map.c | 682 +++++++++++++
drivers/ras/amd/atl/reg_fields.h | 606 ++++++++++++
drivers/ras/amd/atl/system.c | 288 ++++++
drivers/ras/amd/atl/umc.c | 341 +++++++
drivers/ras/amd/fmpm.c | 1013 ++++++++++++++++++++
drivers/ras/cec.c | 10 +-
drivers/ras/debugfs.c | 8 +-
drivers/ras/debugfs.h | 2 +-
drivers/ras/ras.c | 31 +
include/linux/ras.h | 18 +
32 files changed, 5164 insertions(+), 335 deletions(-)
create mode 100644 Documentation/admin-guide/RAS/address-translation.rst
rename Documentation/{RAS/ras.rst => admin-guide/RAS/error-decoding.rst} (73%)
create mode 100644 Documentation/admin-guide/RAS/index.rst
rename Documentation/admin-guide/{ras.rst => RAS/main.rst} (99%)
create mode 100644 drivers/ras/amd/atl/Kconfig
create mode 100644 drivers/ras/amd/atl/Makefile
create mode 100644 drivers/ras/amd/atl/access.c
create mode 100644 drivers/ras/amd/atl/core.c
create mode 100644 drivers/ras/amd/atl/dehash.c
create mode 100644 drivers/ras/amd/atl/denormalize.c
create mode 100644 drivers/ras/amd/atl/internal.h
create mode 100644 drivers/ras/amd/atl/map.c
create mode 100644 drivers/ras/amd/atl/reg_fields.h
create mode 100644 drivers/ras/amd/atl/system.c
create mode 100644 drivers/ras/amd/atl/umc.c
create mode 100644 drivers/ras/amd/fmpm.c


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette