Bad PTEs in page tables

From: Matthew Wilcox
Date: Thu Feb 22 2018 - 17:57:59 EST



We've got a few reports now [1] from users who are seeing a NULL pointer
dereference in the radix tree code. I've tracked it down to bad PTEs.
I don't think it's been seen on 64-bit x86, but it's definitely been
seen on 32-bit x86 running under Xen.

Feb 9 14:31:27 cs01 kernel: Bad swp_entry: 2000000
Feb 9 14:31:27 cs01 kernel: mm/swap_state.c:683: bad pte
ef3a3f38(8000000100000000)

Feb 9 15:35:19 cs01 kernel: Bad swp_entry: 2000000
Feb 9 15:35:19 cs01 kernel: mm/swap_state.c:683: bad pte
eee17f38(8000000100000000)

(I also have a report from an earlier version of the patch with a Bad
swp_entry of 0e000000, so it's not a simple bit-flip like I had been
hoping)

At this point, I think it's out of my realm of expertise. Anyone else
got a good idea why bits 63 & 32 would be set in a PTE? I know 63 is
the XD bit, but bit 32 isn't special in a present PTE.

[1]
https://bugzilla.redhat.com/show_bug.cgi?id=1531779
https://bugzilla.kernel.org/show_bug.cgi?id=198497