Re: [PATCH 2/2] sifive: edac: Add EDAC driver for Sifive l2 Cache Controller

From: Paul Walmsley
Date: Mon Mar 25 2019 - 17:18:44 EST


On Mon, 25 Mar 2019, Borislav Petkov wrote:

> On Sun, Mar 24, 2019 at 05:16:17PM -0700, Paul Walmsley wrote:
> > Looking at the Synopsys,
>
> Look again at synopsys_edac.
>
> > Highbank,
>
> Yes, that one and octeon.
>
> > PowerPC 4xx, and
>
> also a single ppc4xx_edac driver.
>
> > TI EDAC drivers,
>
> There's TI drivers, plural?
>
> I see only ti_edac.c. Also, per-vendor.

All of these drivers are for single IP blocks. Mostly DRAM controllers.
There's no "platform EDAC manager" IP block in these cases.

> > all of those are clearly for IP block error management, rather than
> > platform error management. Has the upstream guidance changed since
> > those drivers were merged?
>
> There are others which are per-platform and work just fine this way:
> xgene_edac, altera_edac, layerscape_edac, qcom_edac, synopsys_edac...

Of your list, only xgene_edac, altera_edac, and qcom_edac have something
that resembles a platform error manager. The others are just for
individual IP blocks.

> > The core issue for us is that we don't have a generalized "ECC management"
> > IP block. And I would just as soon not fake one in the DT data, since the
> > general DT guidance is that the data in DT is meant to describe the actual
> > hardware.
>
> Look at how the others I mentioned above do it.

The Synopsys case is illustrative. Synopsys doesn't have a unified EDAC
platform; they don't sell chips. SoC vendors (like Xilinx) take some
Synopsys IP blocks (like the memory controller), perhaps others from a
different IP vendor like ARM or Cadence, and integrate them into their
SoCs to create their own platforms. They often combine a Synopsys memory
controller with an ARM L2 cache controller. But both of those IP blocks
might be able to detect and report ECC errors.

So as a result of these EDAC limitations, Xilinx hacked their platform
code into the synopsys_edac driver:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/edac/synopsys_edac.c#n901

The problem with this is that it is backwards. The Zynq platform has
other sources of ECC notifications and errors, beyond the Synopsys
DDR controller:

https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf

So the EDAC "platform," if there is one, would be Xilinx Zynq, not
Synopsys. Probably this hasn't been a problem so far because:

1. Xilinx hasn't upstreamed any support for the other EDAC sources on the
chip; and

2. no other SoC vendors using the Synopsys memory controller have bothered
to upstream EDAC support for their platform

> The problem with per IP block is that if those compilation units would
> need to share info or communicate, then that is impossible nowadays and
> you'd need to build something on your own.
>
> Also, the EDAC core supports only one driver.

OK. Would you have a preference between these two options:

1. We could modify the EDAC subsystem to support different EDAC data
sources from different vendors. This would avoid duplicating code for
different platforms that combine EDAC data sources from different IP
blocks. (This seems to me like the better long-term approach.)

2. We could create a platform driver for the "SiFive FU540-C000 EDAC"
reporting platform that wouldn't map to any hardware block, but would call
functions exported by other sources of EDAC data - most likely drivers
living in separate directories. If, for example, we wind up using a
Synopsys memory controller in a future product, we move the Synopsys code
into a separate library, and move the Xilinx Zynq-specific code into a
zynq_edac driver, etc.

Or perhaps you have another idea?


- Paul