Re: [PATCH 17/63] edac_mce: Add an interface driver to report mceerrors via edac

From: Borislav Petkov
Date: Fri Sep 25 2009 - 09:56:44 EST


Hi,

On Fri, Sep 25, 2009 at 09:11:30AM -0300, Mauro Carvalho Chehab wrote:
> > > entry = rcu_dereference(mcelog.next);
> > > for (;;) {
> > > /*
> > > + * If edac_mce is enabled, it will check the error type
> > > + * and will process it, if it is a known error.
> > > + * Otherwise, the error will be sent through mcelog
> > > + * interface
> > > + */
> > > + if (edac_mce_parse(mce))
> > > + return;
> >
> > for the third time (!): this may run in NMI context and as such does not
> > obey to normal kernel locking rules and you cannot safely use almost any
> > kernel resources involving locking. This way, your hook calls into a
> > module, which is a very bad idea. Please remove that hook and put in the
> > polling routine or somewhere more appropriate.
>
> I had answered you already, but let me give a more complete explanation.
>
> For sure all the code called at this point should be carefully analyzed. So,
> let's see the complete implementation:
>
> 1) edac_mce is not a module (see patch 18). So, just calling a routine on
> edac_mce should be safe, even at NMI;

no, I mean the ->check_error member - it could call into a module if
i7core_edac is compiled as such.

<snip the obvious non-registered module case>

> 3) i7core_edac will only start handling mce events after being loaded on memory
> and registered on edac_mce. If an error occurs before it, normal mce handling
> will happen;
>
> 4) after registered, edac_mce will call this hook, at i7core_edac:
>
> static int i7core_mce_check_error(void *priv, struct mce *mce)
> {
> struct mem_ctl_info *mci = priv;
> struct i7core_pvt *pvt = mci->pvt_info;
> unsigned long flags;
>
> /*
> * Just let mcelog handle it if the error is
> * outside the memory controller
> */
> if (((mce->status & 0xffff) >> 7) != 1)
> return 0;
>
> /* Bank 8 registers are the only ones that we know how to handle */
> if (mce->bank != 8)
> return 0;
>
> /* Only handle if it is the right mc controller */
> if (cpu_data(mce->cpu).phys_proc_id != pvt->i7core_dev->socket) {
> debugf0("mc%d: ignoring mce log for socket %d. "
> "Another mc should get it.\n",
> pvt->i7core_dev->socket,
> cpu_data(mce->cpu).phys_proc_id);
> return 0;
> }

One problem here is the debug call which is a printk() and you may
deadlock while doing a printk in an NMI context. That's why you add MCEs
to the lockless buffer in mce_log and decode them later - otherwise you
could just as well printk them here.

Generally, you need to keep the NMI handlers as short as possible and
postpone the parsing of the MCEs for later.

--
Regards/Gruss,
Boris.

Operating | Advanced Micro Devices GmbH
System | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
(OSRC) | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/