Re: x86/mce merge, integration hickup + crash, design thoughts

From: Tim Hockin
Date: Wed Jan 14 2009 - 11:18:41 EST


On Wed, Jan 14, 2009 at 1:29 AM, Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
>
>>
>> I'm 100% on board with that and will even help staff the effort.
>
> Well if you want to change anything the code would be a good idea first to
> establish clearly what is actually broken. I know various areas that need
> improvement (and I have patches fixes most of them), but to my knowledge
> none of them would be fixed by ASCII logging.
>
> Perhaps a good start would be if Ingo could expand what exactly
> he believes is broken currently. At least his earlier "high level" argument
> seems to be large based on clear misunderstandings of what kind
> of MCE events are common and what not. I don't really blame
> him for that since MCEs are obscure and difficult and badly
> documented (I had a hard time getting up to speed on them myself
> and it took me quite some time). But I hope he doesn't
> dismiss the advice from people who have more experience with
> them than him though.
>
> I wrote a long email earlier in the thread with all the reasons why
> ASCII logging is difficult (like the various atomicity issues and also
> others)
> I haven't heard anyone refuting any of the arguments in there, so I assume
> they
> are agreed one by everyone.
>
> I would appreciate if the people who continue to propose ASCII
> logging would explain how they plan to solve these problems.
>
>> This
>> is something that is VERY HIGHLY desired here.
>
> What is exactly desired?

>From my point of view: a single, consistent, easy logging interface
for the kernel to send *structured data* about hardware/system events
and errors up to userspace.

I don't care if it is ASCII, but it probably can be done in ASCII.
That's the cart before the horse, IMHO. I just want something more
structured and better suited than printk().

>> I already have a
>> couple peopel looking at this and other HW-error reporting issues.
>
> I have lots of patches pending for over half a year (including
> tons of bug fixes) and they get all delayed again and again with
> very little justification why. So before writing any new code
> it would be good to just get the already pending improvements in.

We'd LOVE your improvements, if they work. MCE is always a sore point
for us, in that we take too many of them. Anything to reduce their
impact is a win.

Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/