Re: [PATCH] x86 MCE: shut up lockdep warning

From: Ingo Molnar
Date: Fri May 08 2009 - 05:05:08 EST



* Shaohua Li <shaohua.li@xxxxxxxxx> wrote:

> lockdep report below warning when I try to offline one cpu:
> [ 110.835487] =================================
> [ 110.835616] [ INFO: inconsistent lock state ]
> [ 110.835688] 2.6.30-rc4-00336-g8c9ed89 #52
> [ 110.835757] ---------------------------------
> [ 110.835828] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [ 110.835908] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [ 110.835982] (cmci_discover_lock){?.+...}, at: [<ffffffff80236dc0>] cmci_clear+0x30/0x9b
>
> smp_call_function_single() will disable interrupt. moving mce reenable/disable
> to workqueue, so no irq is disabled.
>
> Signed-off-by: Shaohua Li<shaohua.li@xxxxxxxxx>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c
> index 6fb0b35..739c824 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_64.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
> @@ -1057,30 +1057,32 @@ static __cpuinit void mce_remove_device(unsigned int cpu)
> }
>
> /* Make sure there are no machine checks on offlined CPUs. */
> -static void mce_disable_cpu(void *h)
> +static long mce_disable_cpu(void *h)
> {
> int i;
> unsigned long action = *(unsigned long *)h;
>
> if (!mce_available(&current_cpu_data))
> - return;
> + return 0;
> if (!(action & CPU_TASKS_FROZEN))
> cmci_clear();
> for (i = 0; i < banks; i++)
> wrmsrl(MSR_IA32_MC0_CTL + i*4, 0);
> + return 0;
> }
>
> -static void mce_reenable_cpu(void *h)
> +static long mce_reenable_cpu(void *h)
> {
> int i;
> unsigned long action = *(unsigned long *)h;
>
> if (!mce_available(&current_cpu_data))
> - return;
> + return 0;
> if (!(action & CPU_TASKS_FROZEN))
> cmci_reenable();
> for (i = 0; i < banks; i++)
> wrmsrl(MSR_IA32_MC0_CTL + i*4, bank[i]);
> + return 0;
> }
>
> /* Get notified when a cpu comes on/off. Be hotplug friendly. */
> @@ -1106,14 +1108,14 @@ static int __cpuinit mce_cpu_callback(struct notifier_block *nfb,
> case CPU_DOWN_PREPARE:
> case CPU_DOWN_PREPARE_FROZEN:
> del_timer_sync(t);
> - smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
> + work_on_cpu(cpu, mce_disable_cpu, &action);
> break;
> case CPU_DOWN_FAILED:
> case CPU_DOWN_FAILED_FROZEN:
> t->expires = round_jiffies(jiffies +
> __get_cpu_var(next_interval));
> add_timer_on(t, cpu);
> - smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
> + work_on_cpu(cpu, mce_reenable_cpu, &action);
> break;

No, this needs a real fix - not a 'shut up lockdep' workaround.

One problem is that cmci_discover_lock is taken irq-unsafe, which is
obviously a bad idea ...

Could you please try the fix for that from Hidetoshi-san, attached
below (also available in latest -tip).

Thanks,

Ingo

------------------>