RE: [PATCH] EDAC/Intel: Fix shift-out-of-bounds when DIMM/NVDIMM is absent

From: Luck, Tony
Date: Tue May 16 2023 - 13:13:29 EST


>> [ 13.875282] Hardware name: HP HP Z4 G5 Workstation Desktop PC/8962,
> > BIOS U61 Ver. 01.01.15 04/19/2023


>> When a DIMM slot is empty, the read value of mtr can be 0xffffffff, therefore

> Looked like a buggy BIOS/hw that didn't set the mtr register.
>
> 1. Did you print the mtr register whose value was 0xffffffff?
> 2. Can you take a dmesg log with kernel "CONFIG_EDAC_DEBUG=y" enabled?
> 3. What was the CPU? Please take the output of "lscpu".
> 4. Did you verify your patch that the issue was fixed on your systems?

I wonder if BIOS is "hiding" some devices from the OS? The 0xffffffff return is
the standard PCI response for reading a non-existent register. But that doesn't
quite make sense with having a "dimm present" bit in the MTR register. If
the register only exists if the DIMM is present, then there is no need for
a "dimm present" bit.

Some "lspci" output may also be useful.

-Tony