Re: [PATCH] cxl/pci: Change CXL AER support check to use native AER

From: Terry Bowman
Date: Thu Nov 02 2023 - 19:24:52 EST


Hi Dan and Allison,

On 11/2/23 16:31, Dan Williams wrote:
> Dan Williams wrote:
>> Alison Schofield wrote:
>>> On Thu, Nov 02, 2023 at 10:52:32AM -0500, Terry Bowman wrote:
>>>> Native CXL protocol errors are delivered to the OS through AER
>>>> reporting. The owner of AER owns CXL Protocol error management with
>>>> respect to _OSC negotiation.[1] CXL device errors are handled by a
>>>> separate interrupt with native control gated by _OSC control field
>>>> 'CXL Memory Error Reporting Control'.
>>>>
>>>> The CXL driver incorrectly checks for 'CXL Memory Error Reporting
>>>> Control' before accessing AER registers and caching RCH downport
>>>> AER registers. Replace the current check in these 2 cases with
>>>> native AER checks.
>>>
>>> Hi Terry, Does this have a user visible impact?
>>
>> Saw this after I applied it. It is good feedback in general.
>>
>> The reason I did not ask for this clarification was that this is fixing
>> brand new code and was just using the wrong flag, so I had the context.
>> A backporter will never need to make a judgement call about this patch.
>>
>> The end user impact is that CXL protocol errors that could be handled by
>> AER will not be handled if Linux failed to negotiate memory error
>> handling. Memory errors are strictly related to memory-error-record
>> events, not protocol errors.
>
Right, end user impact is RCH error handling will require using native
memory error/event _OSC control inorder for protocol errors to be logged.

> However, to that point the "Fixes:" tag looks wrong, it should be:
>
> f05fd10d138d cxl/pci: Add RCH downstream port AER register discovery

Correct, it is f05fd10d138d.

Regards,
Terry