Re: [PATCH] vfio pci: kernel support of error recovery only for non fatal error

From: Cao jin
Date: Tue Mar 21 2017 - 04:02:06 EST




On 03/20/2017 10:30 PM, Alex Williamson wrote:
> On Mon, 20 Mar 2017 20:50:39 +0800
> Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote:
>
>> Sorry for late.
>>
>> On 03/14/2017 06:06 AM, Alex Williamson wrote:
>>> On Mon, 27 Feb 2017 15:28:43 +0800
>>> Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote:
>>>
>>>> 0. What happens now (PCIE AER only)
>>>> Fatal errors cause a link reset.
>>>> Non fatal errors don't.
>>>> All errors stop the VM eventually, but not immediately
>>>> because it's detected and reported asynchronously.
>>>> Interrupts are forwarded as usual.
>>>> Correctable errors are not reported to guest at all.
>>>> Note: PPC EEH is different. This focuses on AER.
>>>
>>> Perhaps you're only focusing on AER, but don't the error handlers we're
>>> using support both AER and EEH generically? I don't think we can
>>> completely disregard how this affects EEH behavior, if at all.
>>>
>>
>> After taking a rough look at the EEH, find that EEH always feed
>> error_detected with pci_channel_io_frozen, from perspective of
>> error_detected, EEH is not affected.
>>
>> I am not sure about a question: when assign devices in spapr host,
>> should all functions/devices in a PE be bound to vfio? I am kind of
>> confused about the relationship between a PE & a tce iommu group
>
> AIUI, yes all devices within the PE are part of the same IOMMU group
> and therefore all endpoints must be bound to vfio or pci-stub.
>

Thanks. Then I think this approach won't affect EEH. I was considering
the same issue you mentioned for slot_reset may affect EEH, but if they
all must be bound to vfio, seems the issue won't happen to EEH.

--
Sincerely,
Cao jin