Re: [PATCH] Raise maximum number of memory controllers

From: Mauro Carvalho Chehab
Date: Thu Sep 27 2018 - 21:11:33 EST


Em Fri, 28 Sep 2018 00:03:55 +0200
Borislav Petkov <bp@xxxxxxxxx> escreveu:

> On Thu, Sep 27, 2018 at 02:44:01PM -0700, Luck, Tony wrote:
> > The problem with your patch that gets rid of EDAC_MAX_MCS is making
> > device links under /sys/bus/edac. Which is hinted at in some of the
> > code your patch deleted:
> >
> > - /*
> > - * The memory controller needs its own bus, in order to avoid
> > - * namespace conflicts at /sys/bus/edac.
> > - */
> > - name = kasprintf(GFP_KERNEL, "mc%d", mci->mc_idx);
> > - if (!name)
> > - return -ENOMEM;
> > -
> > - mci->bus->name = name;
>
> Yes, and that needed to go because I am using a single bus. Which kinda
> makes sense because you want to have a single bus and multiple devices
> on it. I mean, if we *have* to have a bus.
>
> I think this whole /sys/bus/edac thing is crap and needs to go. We
> have a perfectly fine hierarchy under /sys/devices/system/edac and
> duplicating it under /sys/bus/edac is just bollocks. IMHO. Feel free to
> correct me with, but but, this is useful for...
>
> > which seemed to work.
>
> Right.
>
> > But then I began wondering what are ABI expectations
> > from applications that read the EDAC /sys files?
> >
> > Is this this current source repository? https://github.com/grondo/edac-utils
> >
> > This code doesn't seem to know about the "dimm*" directories below the
> > "mc*" level. It just looks for the csrow* entries.
>
> I guess this is a question for Mauro. I never really needed any special
> edac tool to get info and if you ask me, we probably should try to keep
> it simple and grep sysfs. So that you can always get the info without
> having to install any special tools. Like ftrace works on every system
> with just a shell and basic tools. I think this is very powerful. But
> this is old spartan me only thinking out loud.
>
> In any case, I'm more than fine with dropping the bus hierarchy if
> nothing uses it.

I don't remember about any rationale behind /sys/bus/edac. It was
there already before I start working on EDAC about 10 years ago.
I guess it was used in the past by edac-utils (or maybe it is just a
side effect of the need to create a bus on some past).

Btw, The documented EDAC ABI is /sys/devices/system/edac, as
described at Documentation/ABI/testing/sysfs-devices-edac.

So, I suspect it should be safe to get rid of /sys/bus/edac,
provided that it won't cause side effects at /sys/devices/system/edac.

Why I think it is safe to get rid of /sys/bus/edac?
---------------------------------------------------

As far as I can tell, there are only two toolsets: the legacy edac-utils
and the rasdaemon. At least on Fedora 28, both applications are
packaged (meaning that there are probably people using both).

The edac-utils uses the old sysfs entries (the ones whose entries
are dated up to 2007). I don't see any changes upstream for it
since 2008:

https://sourceforge.net/projects/edac-utils/

I did a grep on its source code (on its version 0.16, from 2018). It seems
that it uses only /sys/devices/system/edac.

The rasdaemon uses also the new sysfs entries (the ones dated as
2012 and 2016). I'm maintaining it. Rastool not only receive traces,
but it can also store them on a database and even generate ABRT events.
It also uses only /sys/devices/system/edac.

On both toolsets, the sysfs entries there are important, in order to
not only list the memory layout and error counts, but also to store
the dimm labels.

The rasdaemon itself uses perf trace events, although Aris added
support for it to work on non-daemon mode, where it just reads
the counters via sysfs, at /sys/devices/system/edac.

Thanks,
Mauro