Re: [PATCH] EDAC/Intel: Fix shift-out-of-bounds when DIMM/NVDIMM is absent

From: Kai-Heng Feng
Date: Tue Jul 04 2023 - 00:28:00 EST


On Wed, Jun 14, 2023 at 3:58 PM Kai-Heng Feng
<kai.heng.feng@xxxxxxxxxxxxx> wrote:
>
> On Wed, May 17, 2023 at 3:49 PM Kai-Heng Feng
> <kai.heng.feng@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, May 17, 2023 at 1:13 AM Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> > >
> > > >> [ 13.875282] Hardware name: HP HP Z4 G5 Workstation Desktop PC/8962,
> > > > > BIOS U61 Ver. 01.01.15 04/19/2023
> > >
> > >
> > > >> When a DIMM slot is empty, the read value of mtr can be 0xffffffff, therefore
> > >
> > > > Looked like a buggy BIOS/hw that didn't set the mtr register.
> > > >
> > > > 1. Did you print the mtr register whose value was 0xffffffff?
> > > > 2. Can you take a dmesg log with kernel "CONFIG_EDAC_DEBUG=y" enabled?
> > > > 3. What was the CPU? Please take the output of "lscpu".
> > > > 4. Did you verify your patch that the issue was fixed on your systems?
> > >
> > > I wonder if BIOS is "hiding" some devices from the OS? The 0xffffffff return is
> > > the standard PCI response for reading a non-existent register. But that doesn't
> > > quite make sense with having a "dimm present" bit in the MTR register. If
> > > the register only exists if the DIMM is present, then there is no need for
> > > a "dimm present" bit.
> >
> > I wonder if the "non-existent register" read is intended?
> >
> > >
> > > Some "lspci" output may also be useful.
> >
> > lspci can be found in [1]:
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=217453
>
> A gentle ping...

Another gentle ping...

>
> >
> > Kai-Heng
> >
> > >
> > > -Tony