Re: [PATCH] x86/mce/therm_throt: Fix the access of uninitialized therm_work

From: Ingo Molnar
Date: Mon Jan 06 2020 - 02:11:14 EST


* Borislav Petkov <bp@xxxxxxxxx> wrote:

> On Mon, Jan 06, 2020 at 06:41:55AM +0000, Chuansheng Liu wrote:
> > In ICL platform, it is easy to hit bootup failure with panic
> > in thermal interrupt handler during early bootup stage.
> >
> > Such issue makes my platform almost can not boot up with
> > latest kernel code.
> >
> > The call stack is like:
> > kernel BUG at kernel/timer/timer.c:1152!
> >
> > Call Trace:
> > __queue_delayed_work
> > queue_delayed_work_on
> > therm_throt_process
> > intel_thermal_interrupt
> > ...
> >
> > When one CPU is up, the irq is enabled prior to CPU UP
> > notification which will then initialize therm_worker.
>
> You mean the unmasking of the thermal vector at the end of
> intel_init_thermal()?
>
> If so, why don't you move that to the end of the notifier and unmask it
> only after all the necessary work like setting up the workqueues etc, is
> done, and save yourself adding yet another silly bool?

A debugging WARN_ON_ONCE() when the workqueue is not initialized yet
would also be useful I suspect. This would turn any remaining race-crash
boot failure in this area into a warning.

Thanks,

Ingo