Re: [PATCH, RESEND] x86/pat: Simplifying the PAT programming protocol

From: Kirill A. Shutemov
Date: Wed Jan 31 2024 - 13:57:32 EST


On Wed, Jan 31, 2024 at 06:57:38PM +0100, Borislav Petkov wrote:
> On Wed, Jan 24, 2024 at 03:06:50PM +0200, Kirill A. Shutemov wrote:
> > The programming protocol for the PAT MSR follows the MTRR programming
> > protocol. However, this protocol is cumbersome and requires disabling
> > caching (CR0.CD=1), which is not possible on some platforms.
> >
> > Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
> > a #VE exception.
> >
> > Turned out the requirement to follow the MTRR programming protocol for
> > PAT programming is unnecessarily strict. The new Intel Software
> > Developer Manual[1] (December 2023) relaxes this requirement. Please
> > refer to the section titled "Programming the PAT" for more information.
>
> How about you state that requirement here instead of referring to that
> doc which is hard to read and changes constantly?
>
> I'd prefer to have that programming requirement spelled out here to know
> in the future what that requirement was and what "variant" was added to
> the kernel in case someone decides to relax it even more.

I summarized the requirements below: TLB has to flashed. Here's what SDM
says:

The operating system (OS) is responsible for ensuring that changes to a
PAT entry occur in a manner that maintains the consistency of the
processor caches and translation lookaside buffers (TLB). It requires the
OS to invalidate all affected TLB entries (including global entries) and
all entries in all paging-structure caches. It may also require flushing
of the processor caches in certain situations.

...

Example of a sequence to invalidate the processor TLBs and caches (if
necessary):

1. If the PCIDE or PGE flag is set in CR4, flush TLBs by clearing one of
those flags (then restore the flag via a subsequent CR4 write).

Otherwise, flush TLBs by executing a MOV from control register CR3 to
another register and then a MOV from that register back to CR3.

2. In the case that there are changes to memory-type mappings for which
cache self-snooping behavior would be problematic given the existing
mappings (e.g., changing a cache line's memory type from WB to UC to be
used for memory-mapped I/O), then cache flushing is also required. This
can be done by executing CLFLUSH operations for all affected cache lines
or by executing the WBINVD instruction (recommended only if there are a
large number of affected mappings or if it is unknown which mappings are
affected)

The second step is relevant for set_memory code that already does the
flushing on changing memory type.

> > The AMD documentation does not link PAT programming to MTRR.
> >
> > The kernel only needs to flush the TLB after updating the PAT MSR. The
> > set_memory code already takes care of flushing the TLB and cache when
> > changing the memory type of a page.
>
> So far so good. However, what guarantees that this relaxing of the
> protocol doesn't break any existing machines?

Our HW folks confirmed that the new optimized sequence works on all past
processors that support PAT.

> If anything, this patch needs to be tested on everything possible. I can
> do that on AMD hw and some old Intels, just to be sure.

Testing wouldn't hurt, but it cannot possibly prove that the new flow is
safe by testing.

> > @@ -296,13 +298,8 @@ void __init pat_bp_init(void)
> > /*
> > * Xen PV doesn't allow to set PAT MSR, but all cache modes are
> > * supported.
> > - * When running as TDX guest setting the PAT MSR won't work either
> > - * due to the requirement to set CR0.CD when doing so. Rely on
> > - * firmware to have set the PAT MSR correctly.
> > */
> > - if (pat_disabled ||
> > - cpu_feature_enabled(X86_FEATURE_XENPV) ||
> > - cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
> > + if (pat_disabled || cpu_feature_enabled(X86_FEATURE_XENPV)) {
> > init_cache_modes(pat_msr_val);
> > return;
>
> What does that mean, now, practically?
>
> That TDX guests virtualize the PAT MSR just like with any other guest or
> what is going on there?
>
> This should be explicitly stated in the commit message.

PAT MST was always virtualized, but we was not able to program it as we
followed MTRR protocol that sets CR0.CD. And I covered it in the commit
message:

Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
a #VE exception.

--
Kiryl Shutsemau / Kirill A. Shutemov