Re: 4.15-rc6 PTI regression: L1 TLB mismatch MCE on Athlon64

From: Thomas Gleixner
Date: Tue Jan 02 2018 - 18:08:06 EST


On Tue, 2 Jan 2018, Borislav Petkov wrote:
> On Tue, Jan 02, 2018 at 10:49:16PM +0200, Meelis Roos wrote:
> > This is on a socket 939 Athlon64 3500+, with PTI enabled.
>
> LOL.
>
> > [ 316.384669] mce: [Hardware Error]: Machine check events logged
> > [ 316.384698] [Hardware Error]: Corrected error, no action required.
> > [ 316.384719] [Hardware Error]: CPU:0 (f:2f:2) MC1_STATUS[-|CE|-|-|AddrV]: 0x9400000000010011
> > [ 316.384742] [Hardware Error]: Error Addr: 0x0000ffff81e000e0
>
> That's the [47:12] slice of the virtual address which it tried to execute.
>
> According to our map in mm.txt:
>
> ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
>
> vs
>
> ffff81e000e0...
>
> which makes me think: WTF now?!
>
> I don't see any hypervisor happening in dmesg...

Meelis, can you please enable CONFIG_X86_PTDUMP. If you select M then
please load the resulting module 'debug_pagetables'.

Then please do the following from a shell:

# cat /sys/kernel/debug/page_tables/* >t.txt

and provide the output.

Thanks,

tglx