Re: [PATCH v4] x86/mce: retrieve poison range from hardware

From: Jane Chu
Date: Thu Jul 28 2022 - 16:31:17 EST


On 7/28/2022 11:46 AM, Dan Williams wrote:
> Jane Chu wrote:
>> On 7/27/2022 1:01 PM, Dan Williams wrote:
>>> Jane Chu wrote:
>>>> On 7/27/2022 12:30 PM, Jane Chu wrote:
>>>>> On 7/27/2022 12:24 PM, Jane Chu wrote:
>>>>>> On 7/27/2022 11:56 AM, Dan Williams wrote:
>>>>>>> Jane Chu wrote:
>>>>>>>> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine
>>>>>>>> poison granularity") that changed nfit_handle_mce() callback to report
>>>>>>>> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been
>>>>>>>> discovered that the mce->misc LSB field is 0x1000 bytes, hence
>>>>>>>> injecting
>>>>>>>> 2 back-to-back poisons and the driver ends up logging 8 badblocks,
>>>>>>>> because 0x1000 bytes is 8 512-byte.
>>>>>>>>
>>>>>>>> Dan Williams noticed that apei_mce_report_mem_error() hardcode
>>>>>>>> the LSB field to PAGE_SHIFT instead of consulting the input
>>>>>>>> struct cper_sec_mem_err record.  So change to rely on hardware whenever
>>>>>>>> support is available.
>>>>>>>>
>>>>>>>> Link:
>>>>>>>> https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@xxxxxxxxxx
>>>>>>>>
>>>>>>>>
>>>>>>>> Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>>>>>>>> Signed-off-by: Jane Chu <jane.chu@xxxxxxxxxx>
>>>>>>>> ---
>>>>>>>>   arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++-
>>>>>>>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/x86/kernel/cpu/mce/apei.c
>>>>>>>> b/arch/x86/kernel/cpu/mce/apei.c
>>>>>>>> index 717192915f28..26d63818b2de 100644
>>>>>>>> --- a/arch/x86/kernel/cpu/mce/apei.c
>>>>>>>> +++ b/arch/x86/kernel/cpu/mce/apei.c
>>>>>>>> @@ -29,15 +29,27 @@
>>>>>>>>   void apei_mce_report_mem_error(int severity, struct
>>>>>>>> cper_sec_mem_err *mem_err)
>>>>>>>>   {
>>>>>>>>       struct mce m;
>>>>>>>> +    int grain = PAGE_SHIFT;
>>>>>>>>       if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
>>>>>>>>           return;
>>>>>>>> +    /*
>>>>>>>> +     * Even if the ->validation_bits are set for address mask,
>>>>>>>> +     * to be extra safe, check and reject an error radius '0',
>>>>>>>> +     * and fallback to the default page size.
>>>>>>>> +     */
>>>>>>>> +    if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) {
>>>>>>>> +        grain = ~mem_err->physical_addr_mask + 1;
>>>>>>>> +        if (grain == 1)
>>>>>>>> +            grain = PAGE_SHIFT;
>>>>>>>
>>>>>>> Wait, if @grain is the number of bits to mask off the address, shouldn't
>>>>>>> this be something like:
>>>>>>>
>>>>>>>      grain = min_not_zero(PAGE_SHIFT,
>>>>>>> hweight64(~mem_err->physical_addr_mask));
>>>>>>
>>>>>> I see. I guess what you meant is
>>>>>>     grain = min(PAGE_SHIFT, (1 +
>>>>>> hweight64(~mem_err->physical_addr_mask)));
>>>>>
>>>>> Sorry, take that back, it won't work either.
>>>>
>>>> This will work,
>>>> grain = min_not_zero(PAGE_SHIFT - 1,
>>>> hweight64(~mem_err->physical_addr_mask));
>>>> grain++;
>>>> but too sophisticated? I guess I prefer the simple "if" expression.
>>>
>>> An "if" is fine, I was more pointing out that:
>>>
>>> hweight64(~mem_err->physical_addr_mask) + 1
>>>
>>> ...and:
>>>
>>> ~mem_err->physical_addr_mask + 1;
>>>
>>> ...give different results.
>>
>> They are different indeed. hweight64 returns the count of set bit while
>> ~mem_err->physical_addr_mask returns a negated value.
>>
>> According to the definition of "Physical Address Mask" -
>>
>> https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf
>>
>> Table N-31 Memory Error Record
>>
>> Physical Address Mask 24 8 Defines the valid address bits in the
>> Physical Address field. The mask specifies the granularity of the
>> physical address which is dependent on the hw/ implementation factors
>> such as interleaving.
>>
>> It appears that "Physical Address Mask" is defined more like PAGE_MASK
>> rather than in bitops hweight64() ofter used to count the set bits as
>> an indication of (e.g.) how many registers are in use.
>>
>> Ans similar to PAGE_MASK, a valid "Physical Address Mask" should
>> consist of a contiguous low 0 bits, not 1's and 0's mixed up.
>>
>> So far, as far as I can see, the v4 patch still looks correct to me.
>> Please let me know if I'm missing anything.
>
> The v4 patch looks broken to me. If the address mask is
> 0xffffffffffffffc0 to indicate a cacheline error then:
>
> ~mem_err->physical_addr_mask + 1;
>
> ...results in a grain of 64 when it should be 6.

Right, it's the exponent that's needed, so back to __ffs64().
Sorry for the detour. v5 is coming next.

thanks!
-jane