Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac

From: Kani, Toshimitsu
Date: Tue Jul 18 2017 - 15:59:04 EST


On Tue, 2017-07-18 at 08:00 +0200, Borislav Petkov wrote:
> On Mon, Jul 17, 2017 at 03:59:12PM -0600, Toshi Kani wrote:
> > The ghes_edac driver was introduced in 2013 [1], but it has not
> > been enabled by any distro yet.ÂÂThis driver obtains error info
> > from firmware interfaces, which are not properly implemented on
> > many platforms, as the driver always emits the messages below:
> >
> > ÂThis EDAC driver relies on BIOS to enumerate memory and get error
> > reports. ÂUnfortunately, not all BIOSes reflect the memory layout
> > correctly ÂSo, the end result of using this driver varies from
> > vendor to vendor ÂIf you find incorrect reports, please contact
> > your hardware vendor Âto correct its BIOS.
> >
> > To get out from this situation, add a platform type check to
> > selectively enable the driver on the platforms that are known to
> > have proper firmware implementation.ÂÂPlatform vendors can add
> > their platforms to the list when they support ghes_edac.
>
> So maintaining whitelists for things has always been a PITA and we
> should try to avoid it, if possible. (We can always do it if nothing
> saner comes along.)

Agreed.

> Now, below is a dirty patch converting ghes_edac to a normal module.
> On systems where we have GHES, the firmware generally disables the
> detection of the presence of ECC hardware, thus preventing the
> platform EDAC driver from loading.

I have HPE Haswell and Skylake test systems with GHES, but they do not
hide IMCs from the OS. So, the sb_edac and skx_edac drivers get
attached on these systems when ghes_edac is disabled.

> Let me clarify: I have an AMD HP box which, when GHES is enabled in
> the BIOS, says that ECC is disabled in the memory controller and the
> amd64_edac driver doesn't load for that memory controller.

Hmm... what's the platform name of this box? I can look into this case
if you need.

> And I think we should try this first: have the firmware disable
> detection methods so that the platform drivers don't load.

I do not think we can rely on this method.

> Then, ghes_edac can be a simple module and no other driver would
> attempt loading.

I like the use of notifier chain, which is much cleaner.

> The question is: does the platform do this disabling now?

Unfortunately, that is not the case today. The IMCs cannot be hidden
with the Device Hide registers for Skylake at least.

Thanks,
-Toshi