Re: WARNING: CPU: 0 PID: 3031 at ./arch/x86/include/asm/fpu/internal.h:530 fpu__restore+0x90/0x130()

From: Ingo Molnar
Date: Wed Feb 17 2016 - 04:35:28 EST



* Borislav Petkov <bp@xxxxxxxxx> wrote:

> On Wed, Feb 17, 2016 at 09:16:46AM +0100, Ingo Molnar wrote:
> > So I'm wondering why this started triggering only now. Is this a pre-existing bug
> > that somehow got triggered via:
> >
> > 58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs
> >
> > ?
>
> Well, that's an interesting question. See, the thing is, I triggered
> this only *once* by accident and I haven't seen it ever since.
>
> The "reliable" "reproducer" I used to debug this was Andy's suggestion
> to stick a schedule() in __fpu__restore_sig().
>
> So the answer to that question is not easy.
>
> BUT(!), regardless, the bug still needs to be fixed because my tracing
> here

The fix is absolutely needed, I just would like deeper analysis about how it
wasn't seen before.

> > If yes then we need a plausible theory of how that never triggered on modern
> > Intel CPUs that had eagerfpu enabled for years.
>
> AFAICT, it triggers - and the window is very small at that - only on
> 32-bit. If at all.

So it probably triggers on vanilla v4.4 (or v4.5-rc4) as well, with no recent FPU
bits applied?

> I can certainly try to test all those but I don't have a reliable reproducer.
> The only thing I could do is check out each of those commits and stick a
> schedule() in __fpu__restore_sig() and see what happens.
>
> But if my analysis above is right, none of those would matter because of the
> mechanism of how the warn happens...

So if you stick a schedule() into vanilla and it triggers then I think we can
declare it an existing bug. (and then the fix also needs Cc: stable)

Thanks,

Ingo