Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs

From: Ingo Molnar
Date: Wed Jan 03 2018 - 10:48:44 EST



* Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:

> On Wed, Jan 03, 2018 at 12:46:00AM -0800, Benjamin Gilbert wrote:
> > [resending with less web]
>
> (adding lkml and x86 developers)
>
> > Hi all,
> >
> > In our regression tests on kernel 4.14.11, we're occasionally seeing a run
> > of "bad pmd" messages during boot, followed by a "BUG: unable to handle
> > kernel paging request". This happens on no more than a couple percent of
> > boots, but we've seen it on AWS HVM, GCE, Oracle Cloud VMs, and local QEMU
> > instances. It always happens immediately after "Loading compiled-in X.509
> > certificates". I can't reproduce it on 4.14.10, nor, so far, on 4.14.11
> > with pti=off. Here's a sample backtrace:

A few other things to check:

first please test the latest WIP.x86/pti branch which has a couple of fixes.

In a -stable kernel tree you should be able to do:

git pull --no-tags git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/pti

in particular this recent fix from a couple of hours ago might make a difference:

52994c256df3: x86/pti: Make sure the user/kernel PTEs match

Note that this commit:

694d99d40972: x86/cpu, x86/pti: Do not enable PTI on AMD processors

disables PTI on AMD CPUs - so if you'd like to test it more broadly on all CPUs
then you'll need to add "pti=on" to your boot commandline.

Thanks,

Ingo