Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

From: Scott Wood
Date: Tue Sep 30 2014 - 20:43:47 EST


On Tue, 2014-09-30 at 08:50 -0700, Guenter Roeck wrote:
> On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
> > On Mon, 2014-09-29 at 23:03 +0000, Jojy Varghese wrote:
> > >
> > > On 9/29/14 12:06 PM, "Guenter Roeck" <linux@xxxxxxxxxxxx> wrote:
> > >
> > > >Those are errors related to PCIe hotplug, and are seen with unexpected
> > > >PCIe
> > > >device removals (triggered, for example, by removing power from a PCIe
> > > >adapter).
> > > >The behavior we see on E5500 is quite similar to the same behavior on
> > > >E500:
> > > >If unhandled, the CPU keeps executing the same instruction over and over
> > > >again
> > > >if there is an error on a PCIe access and thus stalls. I don't know if
> > > >this
> > > >is considered an erratum or expected behavior, but it is one we have to
> > > >address
> > > >since we have to be able to handle that condition.
> >
> > The reason I ask is that the handling for e500 was described as an
> > erratum workaround. If it is an erratum it would be nice to know the
> > erratum number and the full list of affected chips.
> >
> My understanding, which may be wrong, was that this is expected behavior,
> at least for E5500. I actually thought I had seen it somewhere in the
> specification (response to PCIe errors), but I don't recall where exactly.
>
> At least for my part I am not aware of an erratum.

Jia Hongtao, can you comment here?

> > > >Ultimately, we'll want
> > > >to
> > > >implement PCIe error handlers for the affected drivers, but that will be
> > > >a next
> > > >step.
> >
> > For now can we at least print a ratelimited error message? I don't like
> > the idea of silently ignoring these errors. I suppose it's a separate
> > issue from extending the workaround to cover e500mc, though.
> >
> I don't really like the idea of printing an error message pretty much each time
> when an unexpected hotplug event occurs.

Unexpected events seem like the sort of thing you'd want to log, but my
concern is that this might not be the only cause of PCI errors.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/