Re: [RFC/PATCH, 1/4] readX_check() performance evaluation

From: Andi Kleen
Date: Wed Jan 28 2004 - 15:03:58 EST


On Wed, 28 Jan 2004 11:48:05 -0800
David Mosberger <davidm@xxxxxxxxxxxxxxxxx> wrote:

> >>>>> On Wed, 28 Jan 2004 20:39:15 +0100, Andi Kleen <ak@xxxxxxx> said:
>
> >> Yet they are a good indicator that something is wrong (not performing
> >> properly) or may be failing soon. I don't think putting on blinders
> >> for such problems is a good idea. Though I agree that the question of
>
> Andi> Most server class hardware should log it somewhere and allow
> Andi> to read the event log in the firmware. This even works for
> Andi> unhandleable errors unlike what the OS could do.
>
> And you'd want to reboot your server just so you can check on the soft
> failure rate? ;-)

Yep, I reboot my machines all the time ;-)

Seriously you can count it somewhere and present it in sysfs or /proc.
Or log it somewhere else and supply a special utility to show them
that makes it clear that the events are hardware and not software related.
I suppose if your server vendor is serious they will supply a tool
to read the firmware log from a running system.

But printks enabled by default are a bad idea (and a bug too BTW - printk called from
MCE handlers can randomly deadlock)

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/