Re: [PATCH v4] x86/mce: retrieve poison range from hardware

From: Dan Williams
Date: Thu Jul 28 2022 - 14:46:32 EST


Jane Chu wrote:
> On 7/27/2022 1:01 PM, Dan Williams wrote:
> > Jane Chu wrote:
> >> On 7/27/2022 12:30 PM, Jane Chu wrote:
> >>> On 7/27/2022 12:24 PM, Jane Chu wrote:
> >>>> On 7/27/2022 11:56 AM, Dan Williams wrote:
> >>>>> Jane Chu wrote:
> >>>>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine
> >>>>>> poison granularity") that changed nfit_handle_mce() callback to report
> >>>>>> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been
> >>>>>> discovered that the mce->misc LSB field is 0x1000 bytes, hence
> >>>>>> injecting
> >>>>>> 2 back-to-back poisons and the driver ends up logging 8 badblocks,
> >>>>>> because 0x1000 bytes is 8 512-byte.
> >>>>>>
> >>>>>> Dan Williams noticed that apei_mce_report_mem_error() hardcode
> >>>>>> the LSB field to PAGE_SHIFT instead of consulting the input
> >>>>>> struct cper_sec_mem_err record.  So change to rely on hardware whenever
> >>>>>> support is available.
> >>>>>>
> >>>>>> Link:
> >>>>>> https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@xxxxxxxxxx
> >>>>>>
> >>>>>>
> >>>>>> Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> >>>>>> Signed-off-by: Jane Chu <jane.chu@xxxxxxxxxx>
> >>>>>> ---
> >>>>>>   arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++-
> >>>>>>   1 file changed, 13 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/arch/x86/kernel/cpu/mce/apei.c
> >>>>>> b/arch/x86/kernel/cpu/mce/apei.c
> >>>>>> index 717192915f28..26d63818b2de 100644
> >>>>>> --- a/arch/x86/kernel/cpu/mce/apei.c
> >>>>>> +++ b/arch/x86/kernel/cpu/mce/apei.c
> >>>>>> @@ -29,15 +29,27 @@
> >>>>>>   void apei_mce_report_mem_error(int severity, struct
> >>>>>> cper_sec_mem_err *mem_err)
> >>>>>>   {
> >>>>>>       struct mce m;
> >>>>>> +    int grain = PAGE_SHIFT;
> >>>>>>       if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
> >>>>>>           return;
> >>>>>> +    /*
> >>>>>> +     * Even if the ->validation_bits are set for address mask,
> >>>>>> +     * to be extra safe, check and reject an error radius '0',
> >>>>>> +     * and fallback to the default page size.
> >>>>>> +     */
> >>>>>> +    if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) {
> >>>>>> +        grain = ~mem_err->physical_addr_mask + 1;
> >>>>>> +        if (grain == 1)
> >>>>>> +            grain = PAGE_SHIFT;
> >>>>>
> >>>>> Wait, if @grain is the number of bits to mask off the address, shouldn't
> >>>>> this be something like:
> >>>>>
> >>>>>      grain = min_not_zero(PAGE_SHIFT,
> >>>>> hweight64(~mem_err->physical_addr_mask));
> >>>>
> >>>> I see. I guess what you meant is
> >>>>     grain = min(PAGE_SHIFT, (1 +
> >>>> hweight64(~mem_err->physical_addr_mask)));
> >>>
> >>> Sorry, take that back, it won't work either.
> >>
> >> This will work,
> >> grain = min_not_zero(PAGE_SHIFT - 1,
> >> hweight64(~mem_err->physical_addr_mask));
> >> grain++;
> >> but too sophisticated? I guess I prefer the simple "if" expression.
> >
> > An "if" is fine, I was more pointing out that:
> >
> > hweight64(~mem_err->physical_addr_mask) + 1
> >
> > ...and:
> >
> > ~mem_err->physical_addr_mask + 1;
> >
> > ...give different results.
>
> They are different indeed. hweight64 returns the count of set bit while
> ~mem_err->physical_addr_mask returns a negated value.
>
> According to the definition of "Physical Address Mask" -
>
> https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf
>
> Table N-31 Memory Error Record
>
> Physical Address Mask 24 8 Defines the valid address bits in the
> Physical Address field. The mask specifies the granularity of the
> physical address which is dependent on the hw/ implementation factors
> such as interleaving.
>
> It appears that "Physical Address Mask" is defined more like PAGE_MASK
> rather than in bitops hweight64() ofter used to count the set bits as
> an indication of (e.g.) how many registers are in use.
>
> Ans similar to PAGE_MASK, a valid "Physical Address Mask" should
> consist of a contiguous low 0 bits, not 1's and 0's mixed up.
>
> So far, as far as I can see, the v4 patch still looks correct to me.
> Please let me know if I'm missing anything.

The v4 patch looks broken to me. If the address mask is
0xffffffffffffffc0 to indicate a cacheline error then:

~mem_err->physical_addr_mask + 1;

...results in a grain of 64 when it should be 6.