Re: [PATCH] Add support of NVDIMM memory error notification in ACPI 6.2

From: Kani, Toshimitsu
Date: Wed Jun 07 2017 - 17:34:14 EST


On Wed, 2017-06-07 at 14:06 -0700, Dan Williams wrote:
> On Wed, Jun 7, 2017 at 1:57 PM, Kani, Toshimitsu <toshi.kani@xxxxxxx>
> wrote:
> > On Wed, 2017-06-07 at 12:09 -0700, Dan Williams wrote:
> > > On Wed, Jun 7, 2017 at 11:49 AM, Toshi Kani <toshi.kani@xxxxxxx>
> > > wrote:
> >
> > Â:
> > > > +
> > > > +static void acpi_nfit_uc_error_notify(struct device *dev,
> > > > acpi_handle handle)
> > > > +{
> > > > +ÂÂÂÂÂÂÂstruct acpi_nfit_desc *acpi_desc =
> > > > dev_get_drvdata(dev);
> > > > +
> > > > +ÂÂÂÂÂÂÂacpi_nfit_ars_rescan(acpi_desc);
> > >
> > > I wonder if we should gate re-scanning with a similar:
> > >
> > > ÂÂÂÂif (acpi_desc->scrub_mode == HW_ERROR_SCRUB_ON)
> > >
> > > ...check that we do in the mce notification case? Maybe not since
> > > we
> > > don't get an indication of where the error is without a rescan.
> >
> > I think this mce case is different since the MCE handler already
> > knows where the new poison location is and can update badblocks
> > information for it.ÂÂStarting ARS is an optional precaution.
> >
> > > However, at a minimum I think we need support for the new Start
> > > ARS flag ("If set to 1 the firmware shall return data from a
> > > previous scrub, if any, without starting a new scrub") and use
> > > that for this case.
> >
> > That's an interesting idea.ÂÂBut I wonder how users know if it is
> > OK to set this flag as it relies on BIOS implementation that is not
> > described in ACPI...
>
> Ugh, you're right. We might need a revision-id=2 version of Start ARS
> so software knows that the BIOS is aware of the new flag.

My bad. Looking at ACPI 6.2, it actually defines what you described.
Start ARS now defines bit[1] in Flags which can be set to avoid
scanning for this notification. I will update the patch to set this
flag when HW_ERROR_SCRUB_ON is not set.

Thanks,
-Toshi