Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

From: Austin.Bolen
Date: Wed Feb 27 2019 - 15:11:46 EST


On 2/27/2019 11:56 AM, Bolen, Austin wrote:
>
> BTW, this patch in particular is complaining about an error for a
> removed device. The Dell servers referenced in this chain will check if
> the device is removed and if so it will suppress the error so I don't
> think they are susceptible to this particular issue and I agree it is
> broken if they do. If that is the case we can and will fix it in firmware.
>

Confirmed this issue does not apply to the referenced Dell servers so I
don't not have a stake in how this should be handled for those systems.
It may be they just don't support surprise removal. I know in our case
all the Linux distributions we qualify (RHEL, SLES, Ubuntu Server) have
told us they do not support surprise removal. So I'm guessing that any
issues found with surprise removal could potentially fall under the
category of "unsupported".

Still though, the larger issue of recovering from other types of PCIe
errors that are not due to device removal is still important. I would
expect many system from many platform makers to not be able to recover
PCIe errors in general and hopefully the new DPC CER model will help
address this and provide added protection for cases like above as well.

Thanks,
Austin