Re: [RESEND][PATCH -tip 2/3] x86, mce: Revert "add mce=nopoll option to disable timer polling"

From: Andi Kleen
Date: Mon Apr 20 2009 - 03:26:58 EST


Hidetoshi Seto <seto.hidetoshi@xxxxxxxxxxxxxx> writes:

> Disabling only polling but not cmci is pointless setting.
> Instead of "mce=nopoll" which tend to be paired with cmci disablement,
> it rather make sense to have a "mce=ignore_ce" option that disable
> both of polling and cmci at once. A patch for this new implementation
> will follow this reverting patch.
>
> OTOH, once booted, we can disable polling by setting check_interval
> to 0, but there are no mention about the fact. Later Andi will post
> updated documents that can respond this issue.

I still think that patch has bad semantics because you leave around
the events in the machine check registers and never clear
them. Especially with MCA recovery that has very unfortunate side
effects -- it means the OVER bit will be set and a in principle
recoverable MCA will require a panic. Even without MCA recovery it has
similar problems and will lead to confusing log output for non CE
MCAs.

I think a patch to not log corrected errors would be reasonable,
but you still need to clear the events from the machine check
banks at least.

So I would recommend you add a mce=dont_log_ce or somesuch
that just guards the mce_log() call in machine_check_poll()

Also for your use case really the better way would be to use
some way to let the firmware communicate that it doesn't want the OS
to log.

Also BTW before adding new features like this it would be a good
idea to first add the bug fixes I posted two weeks ago.

-Andi


--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/