RE: Linux 3.2: FPU Issue in execve with Intel E5-2620v3 and E7-4880v2

From: Cai, Jason
Date: Wed Mar 22 2017 - 02:22:16 EST


Hi Greg K.H.,

Thanks for reply. Yes, you're right.

Finally, I found the root-cause of my FPU issue. It's a bug in one of our driver
which registered a timer and uses FPU in its timer function. That triggered a
DNA (Device Not Available), and unexpectedly changed the fpu state of the
current process. At the end of `load_elf_binary()`, the fpu state is freed by
calling `start_thread_common()`, but the incorrect fpu state remains after
`load_elf_binary()`.

Anyway, thanks for your kindness and your information.

Best regards,
Jason Cai


-----Original Message-----
From: Greg KH [mailto:gregkh@xxxxxxxxxxxxxxxxxxx]
Sent: 2017年3月22日 14:10
To: Cai, Jason <Jason.Cai@xxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx; kernelnewbies@xxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: Linux 3.2: FPU Issue in execve with Intel E5-2620v3 and E7-4880v2

On Wed, Mar 22, 2017 at 01:58:58AM +0000, Cai, Jason wrote:
> Dear Kernel Hackers,
>
> I'm Jason Cai, a kernel developer from Dell EMC. I hit the same issue as the
> one Lennart Sorensen sent at Dec 19, 2016.
>
> I narrow down the issue now. It seems that an unexpected DNA
> (Device not Available) may be triggered in the `execve` code path.
> Specifically, it exists between `setup_new_exec()` and `start_thread()` in
> file `load_elf_binary()`.
>
> I've added a BUG_ON() just before `start_thread` in `load_elf_binary ` to
> assert the fpu status of the current process descriptor should be clean
> when performing an exec. It gets triggered and the stack is as the following:

As you have a closed kernel module loaded, it's impossible for us to
actually tell what you are doing, or support you at all, sorry. Please
work with the group that gave you that code, as they are the only ones
that can do so.

Also, does this happen with 4.10? 3.2 is _really_ old you know.

thansk,

greg k-h