Re: [PATCH v5 13/16] x86: decouple PAT and MTRR handling

From: Juergen Gross
Date: Fri Dec 02 2022 - 00:56:56 EST


On 02.12.22 00:57, Kirill A. Shutemov wrote:
On Thu, Dec 01, 2022 at 05:33:28PM +0100, Juergen Gross wrote:
On 01.12.22 17:26, Kirill A. Shutemov wrote:
On Wed, Nov 02, 2022 at 08:47:10AM +0100, Juergen Gross wrote:
Today PAT is usable only with MTRR being active, with some nasty tweaks
to make PAT usable when running as Xen PV guest, which doesn't support
MTRR.

The reason for this coupling is, that both, PAT MSR changes and MTRR
changes, require a similar sequence and so full PAT support was added
using the already available MTRR handling.

Xen PV PAT handling can work without MTRR, as it just needs to consume
the PAT MSR setting done by the hypervisor without the ability and need
to change it. This in turn has resulted in a convoluted initialization
sequence and wrong decisions regarding cache mode availability due to
misguiding PAT availability flags.

Fix all of that by allowing to use PAT without MTRR and by reworking
the current PAT initialization sequence to match better with the newly
introduced generic cache initialization.

This removes the need of the recently added pat_force_disabled flag, so
remove the remnants of the patch adding it.

Signed-off-by: Juergen Gross <jgross@xxxxxxxx>

This patch breaks boot for TDX guest.

Kernel now tries to set CR0.CD which is forbidden in TDX guest[1] and
causes #VE:

tdx: Unexpected #VE: 28
VE fault: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1-00015-gadfe7512e1d0 #2646
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:native_write_cr0 (arch/x86/kernel/cpu/common.c:427)
Call Trace:
<TASK>
? cache_disable (arch/x86/include/asm/cpufeature.h:173 arch/x86/kernel/cpu/cacheinfo.c:1085)
? cache_cpu_init (arch/x86/kernel/cpu/cacheinfo.c:1132 (discriminator 3))
? setup_arch (arch/x86/kernel/setup.c:1079)
? start_kernel (init/main.c:279 (discriminator 3) init/main.c:477 (discriminator 3) init/main.c:960 (discriminator 3))
? load_ucode_bsp (arch/x86/kernel/cpu/microcode/core.c:155)
? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
</TASK>

Any suggestion how to fix it?

[1] Section 10.6.1. "CR0", https://cdrdv2.intel.com/v1/dl/getContent/733568

What was the solution before?

I guess MTRR was disabled, so there was no PAT, too?

Right:

Linus' tree:

[ 0.002589] last_pfn = 0x480000 max_arch_pfn = 0x10000000000
[ 0.003976] Disabled
[ 0.004452] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[ 0.005856] CPU MTRRs all blank - virtualized system.
[ 0.006915] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC

tip/master:

[ 0.003443] last_pfn = 0x20b8e max_arch_pfn = 0x10000000000
[ 0.005220] Disabled
[ 0.005818] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
[ 0.007752] tdx: Unexpected #VE: 28

The dangling "Disabled" comes mtrr_bp_init().


If this is the case, you can go the same route as Xen PV guests do.

Any reason X86_FEATURE_HYPERVISOR cannot be used instead of
X86_FEATURE_XENPV there?

Do we have any virtualized platform that supports it?

Yes, of course. Any hardware virtualized guest should be able to use it,
obviously TDX guests are the first ones not being able to do so.

And above dmesg snipplets are showing rather nicely that not disabling
PAT completely should be a benefit for TDX guests, as all caching modes
would be usable (the PAT MSR seems to be initialized quite fine).

Instead of X86_FEATURE_XENPV we could introduce something like
X86_FEATURE_PAT_READONLY, which could be set for Xen PV guests and for
TDX guests.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature