[PATCH 00/20] MCA Updates

From: Yazen Ghannam
Date: Sat Nov 18 2023 - 14:33:14 EST


Hi all,

This set is a collection of logically independent updates that make
changes to common code. I've collected them to resolve conflicts and
ordering. Furthermore, this is the first half of a larger set. The
second half is focused on refactoring the AMD MCA Thresholding feature
support. So I decided to leave out the second half for now. The second
part will include AMD CMCI Storm handling support on top of the
refactored code.

Patch 1 is a small, standalone fix for an issue I noticed during testing
of this set.

Patches 2-3 are a redo of a previous set dealing with BERT MCA decode
and preemption.
https://lore.kernel.org/r/20230622131841.3153672-1-yazen.ghannam@xxxxxxx

Patches 4-12 are general refactoring in preparation for later patches in
this set and the second planned set. The overall theme is to simplify
the AMD MCA init flow and to remove unnecessary data caching in per-CPU
variables. The init flow refactor will be completed in the second patch
set, since much of the cached data is used to set up MCA Thresholding.

Patches 13-14 unify the AMD THR and DFR interrupt handlers with MCA
polling.

Patch 15 is a small fix for the MCA Thresholding init path.

Patch 16 adds support for a new Corrected Error Interrupt on Scalable
MCA systems.

Patches 17-20 add support for new Scalable MCA registers and FRU Text
decoding feature. This is a follow up to a previous set.
https://lore.kernel.org/r/20220418174440.334336-1-yazen.ghannam@xxxxxxx

Thanks,
Yazen

Avadhut Naik (2):
x86/mce: Add wrapper for struct mce to export vendor specific info
x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

Yazen Ghannam (18):
x86/mce/inject: Clear test status value
x86/mce: Define mce_setup() helpers for global and per-CPU fields
x86/mce: Use mce_setup() helpers for apei_smca_report_x86_error()
x86/mce/amd, EDAC/mce_amd: Move long names to decoder module
x86/mce/amd: Use helper for UMC bank type check
x86/mce/amd: Use helper for GPU UMC bank type checks
x86/mce/amd: Use fixed bank number for quirks
x86/mce/amd: Look up bank type by IPID
x86/mce/amd: Clean up SMCA configuration
x86/mce/amd: Prep DFR handler before enabling banks
x86/mce/amd: Simplify DFR handler setup
x86/mce/amd: Clean up enable_deferred_error_interrupt()
x86/mce: Unify AMD THR handler with MCA Polling
x86/mce/amd: Unify AMD DFR handler with MCA Polling
x86/mce: Skip AMD threshold init if no threshold banks found
x86/mce/amd: Support SMCA Corrected Error Interrupt
x86/mce/apei: Handle variable register array size
EDAC/mce_amd: Add support for FRU Text in MCA

arch/x86/include/asm/mce.h | 30 +-
arch/x86/kernel/cpu/mce/amd.c | 534 +++++++++++++-----------
arch/x86/kernel/cpu/mce/apei.c | 125 ++++--
arch/x86/kernel/cpu/mce/core.c | 243 +++++++----
arch/x86/kernel/cpu/mce/genpool.c | 20 +-
arch/x86/kernel/cpu/mce/inject.c | 5 +-
arch/x86/kernel/cpu/mce/internal.h | 11 +-
drivers/edac/amd64_edac.c | 2 +-
drivers/edac/mce_amd.c | 70 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +-
include/trace/events/mce.h | 47 ++-
11 files changed, 671 insertions(+), 425 deletions(-)


base-commit: 35f30e2dfdccfba60c413248e03782b8793f92e6
--
2.34.1