Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

From: Thomas Gleixner
Date: Wed Nov 30 2016 - 04:18:14 EST


On Wed, 30 Nov 2016, Borislav Petkov wrote:
> On Wed, Nov 30, 2016 at 09:54:58AM +0100, Thomas Gleixner wrote:
> > Right, that's the safe bet. But I'm quite sure that the C1E crap only
> > starts to work _after_ ACPI initialization.
>
> Yap, I think it is an ACPI decision whether to enter C1E or not. And all
> those boxes which are unaffected - they actually are but since the FW
> doesn't enter C1E, they don't hit the bug.
>
> > tick_force_broadcast() is irreversible
>
> So if I let the cores in that small window use amd_e400_idle() and
> then when I detect the machine doesn't enter C1E after all, I do
> tick_broadcast_exit() on all cores in amd_e400_c1e_mask, then clear it
> and free it, that would work?
>
> Or do you see a better solution?

Start with the bug cleared and select amd_e400_idle(), which will then use
default_idle() and not do any of the broadcast crap.

After ACPI gets initialized check the C1E misfeature and if it's detected,
then set the CPU BUG and send an IPI to all online CPUs.

I think this is safe because the CPU running the ACPI initialization is not
in idle and if any of the other CPUs would be stuck due to the C1E
enablement, then the IPI will kick them out.

At least worth a try.

Thanks,

tglx