Re: [PATCH 2/2] powerpc/book3s: mce: Use add_taint_no_warn() in machine_check_early().

From: Daniel Axtens
Date: Mon Apr 17 2017 - 06:39:59 EST


Hi Mahesh,

> Fixes: 27ea2c420cad powerpc: Set the correct kernel taint on machine check errors.

I notice this Fixes a commit I introduced. Please could you cc me when
you do this? I am likely to miss it otherwise, especially since I have
now left IBM.

Being cced allows me to provide an Ack or a review. And getting feedback
on my changes is very helpful in becoming a better programmer.

In this case, as per Michael's comment, why don't we just move the
add_taint from machine_check_early to
machine_check_process_queued_event - the other side of the work queue.

The work queue system is supposed to provide us with a safe place to do
printing, etc., so it's an appropriate place. Also, we already do
machine_check_print_event_info there, and adding the taint doesn't need
to be done synchronously.

Regards,
Daniel

Mahesh J Salgaonkar <mahesh@xxxxxxxxxxxxxxxxxx> writes:

> From: Mahesh Salgaonkar <mahesh@xxxxxxxxxxxxxxxxxx>
>
> machine_check_early() gets called in real mode. The very first time when
> add_taint() is called, it prints a warning which ends up calling opal
> call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
> very first machine check while we are in opal we are doomed. OPAL_CALL
> overwrites the PACASAVEDMSR in r13 and in this case when we are done with
> MCE handling the original opal call will use this new MSR on it's way
> back to opal_return. This usually leads unexpected behaviour or kernel
> to panic. Instead use the add_taint_no_warn() that does not call printk.
>
> This is broken with current FW level. We got lucky so far for not getting
> very first MCE hit while in OPAL. But easily reproducible on Mambo.
> This should go to stable as well alongwith patch 1/2.
>
> Signed-off-by: Mahesh Salgaonkar <mahesh@xxxxxxxxxxxxxxxxxx>
> ---
> arch/powerpc/kernel/traps.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 62b587f..4a048dc 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -306,7 +306,7 @@ long machine_check_early(struct pt_regs *regs)
>
> __this_cpu_inc(irq_stat.mce_exceptions);
>
> - add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
> + add_taint_no_warn(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>
> /*
> * See if platform is capable of handling machine check. (e.g. PowerNV