Re: [RFC 3/6] x86, NMI, Rename memory parity error to PCI SERRerror

From: Borislav Petkov
Date: Tue Sep 21 2010 - 02:38:19 EST


From: Huang Ying <ying.huang@xxxxxxxxx>
Date: Mon, Sep 20, 2010 at 08:22:28PM -0400

(Forgot to add edac-devel to Cc)

> > What is more, there are a bunch of edac drivers using the PCI SERR nmi
> > as a means to check for PCI errors so we shouldn't be removing it now,
> > should we?
>
> After checking the source, I found in mem_parity_error (will renamed to
> pci_serr_error), edac_atomic_assert_error() is called, which increase
> edac_err_assert, edac_err_assert is used in
> edac_mc_assert_error_check_and_clear(), which is used in
> edac_mc_workq_function for memory error only, not for PCI errors.

Yes, I suppose the edac part in the mem_parity_error() was originally
meant for memory parity errors. Now, I understand your incentive of
changing that to handle PCI SERR errors but by axing the edac part,
you're practically disabling the mci->edac_check() call for edac
drivers using NMIs for error reporting (I don't know how many do that,
btw...) and almost every edac driver defines that function pointer to a
driver-specific error checking function.

So if there are no more IBM PC-AT machines running Linux out, I
think we can rip out the whole code around edac_err_assert and thus
remove the edac_mc_assert_error_check_and_clear() part from the
edac_mc_workq_function() which would make all edac drivers solely poll
for mem errors.

What do the others think, Doug?

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/