Re: XSAVE / RDPKRU on Intel 11th Gen Core CPUs

From: Dave Hansen
Date: Mon Nov 08 2021 - 14:37:36 EST


... adding LKML and x86@

On 11/8/21 9:37 AM, Brian Geffon wrote:
> We (ChromeOS) have run into an issue which we believe is related to
> the following errata on 11th Gen Intel Core CPUs:
>
> "TGL034 A SYSENTER FOLLOWING AN XSAVE OR A VZEROALL MAY LEAD TO
> UNEXPECTED SYSTEM BEHAVIOR" [1]

I'm struggling to figure out what that has to do with PKRU, though. I
don't think that erratum is related at all to the issue you're seeing.

> Essentially we notice that the value returned by a RDPKRU instruction
> will flip after some amount of time when running on kernels earlier
> than 5.14. I have a simple repro that can be used [2].

What does it flip to, btw? Can you dump the whole register state?

> After a little digging it appears a lot of work was done to refactor
> that code and I bisected to the following commit which fixes the
> issue:
>
> commit 954436989cc550dd91aab98363240c9c0a4b7e23
> Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Date: Wed Jun 23 14:02:21 2021 +0200
>
> x86/fpu: Remove PKRU handling from switch_fpu_finish()
>
> I backported this patch to 5.4 and it does appear to fix the issue
> because it avoids XSAVE. However, I have no idea if it's actually
> fixing anything or if the behavior is working as intended. So we're
> curious, does it make sense to pull back that patch, would that patch
> be enough? Any guidance here would be appreciated because this does
> seem broken (because of how it was previously implemented) for those
> CPUs prior to 5.14, which is why I'm CCing stable@.

I suspect what you're seeing is that the:

- __write_pkru(pkru_val);

in that commit was somehow writing a bad value which was read out of the
XSAVE buffer. That commit stops reading PKRU out of the XSAVE buffer,
which probably has bad state. Just backporting this patch won't do you
any good. You'll need to also backport the stuff that stops using the
XSAVE buffer for PKRU in the first place.

The code doesn't bite you until the task context switches. It probably
has to switch to some pkey-using task and then back to your test app.
I'd randomly guess that your test app is getting a "leaked" PKRU from
another app. It's _probably_ not a stale PKRU value (like from reading
a PKRU!=0 value from the XSAVE buffer when XSTATE_BV[PKRU]=0) because
your test app should have PKRU=0 set at all times.

Is KVM active on your test system?