Re: [tip:x86/mce] x86, mce: Rename cpu_specific_poll tomce_cpu_specific_poll

From: Borislav Petkov
Date: Mon Feb 22 2010 - 03:28:56 EST


From: Ingo Molnar <mingo@xxxxxxx>
Date: Tue, Feb 16, 2010 at 10:02:15PM +0100
Hi,

> I like it.
>
> You can do it as a 'perf hw' subcommand - or start off a fork as the 'hw'
> utility, if you'd like to maintain it separately. It would have a daemon
> component as well, to receive and log hardware events continuously, to
> trigger policy action, etc.
>
> I'd suggest you start to do it in small steps, always having something that
> works - and extend it gradually.

I had the chance to meditate over the weekend a bit more on the whole
RAS thing after rereading all the discussion points more carefully.
Here are some aspects I think are important which I'd like to drop here
rather sooner than later so that we're in sync and don't waste time
implementing the wrong stuff:

* Critical errors: we need to switch to a console and dump decoded error
there at least, before panicking. Nowadays, almost everyone has a camera
with which that information can be extracted from the screen. I'm afraid
we won't be able to send the error over a network since climbing up the
TCP stack takes relatively long and we cannot risk error propagation...?
We could try to do it on a core which is not affected by the error
though as a last step in the sequence...

I think this is much more user-friendly than the current panicking
which is never seen when running X except when the user has a
serial/netconsole sending to some other machine.

All other non-that-critical errors are copied to userspace over a
mmapped buffer and then the uspace daemon is being poked with a uevent
to dump the error/signal over network/parse its contents and do policy
stuff.

* receive commands by syscall, also for hw config: I like the idea
of sending commands to the kernel over a syscall, we can reuse perf
functionality here and make those reused bits generic.

* do not bind to error format etc: not a big fan of slaving to an error
format - just dump error info into the buffer and let userspace format
it. We can do the formatting if we absolutely have to.

* can also configure hw: The tool can also send commands over the
syscall to configure certain aspects of the hardware, like:

- disable L3 cache indices which are faulty
- enable/disable MCE error sources: toggle MCi_CTL, MCi_CTL_MASK bits
- disable whole DIMMs: F2x[1, 0][5C:40][CSEnable]
- control ECC checking
- enable/disable powering down of DRAM regions for power savings
- set memory clock frequency
- some other relevant aspects of hw/CPU configuration

* keep all info in sysfs so that no tool is needed for accessing it,
similar to ftrace: All knobs needed for user interaction should appear
redundantly as sysfs files/dirs so that configuration/query can be done
"by hand" even when the hw tool is missing

* gradually move pieces of RAS code into kernel proper: important
codepaths/aspects from the HW which are being queried often (e.g., DIMM
population and config) should be moved gradually into the kernel proper.


Anyways, this is by all means not complete and still as alpha as it can
be. However, I'd like to discuss it as early as possble and in small,
incremental steps, omitting trial and error as much as possible. So,
feel free to throw all your crazy ideas at me and correct (or kill) all
those crappy points above.

Thanks.

--
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/