Re: [PATCH] x86/mm: Remove "INVPCID single" feature tracking

From: andrew . cooper3
Date: Fri Jul 14 2023 - 16:27:43 EST


On 14/07/2023 7:35 pm, Dave Hansen wrote:
> From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>
> tl;dr: Replace a synthetic X86_FEATURE with a hardware X86_FEATURE
> and check of existing per-cpu state.
>
> == Background ==
>
> There are three features in play here:
> 1. Good old Page Table Isolation (PTI)
> 2. Process Context IDentifiers (PCIDs) which allow entries from
> multiple address spaces to be in the TLB at once.
> 3. Support for the "Invalidate PCID" (INVPCID) instruction,
> specifically the "individual address" mode (aka. mode 0).
>
> When all *three* of these are in place, INVPCID can and should be used
> to flush out individual addresses in the PTI user address space.
>
> But there's a wrinkle or two: First, this INVPCID mode is dependent on
> CR4.PCIDE. Even if X86_FEATURE_INVPCID==1, the instruction may #GP
> without setting up CR4.

Can the SDM authors go and reconsider their position of (not) including
this condition in the exception list.

Or give up and just point intel.com/sdm at AMD, because AMD do describe
this coherently.

> diff -puN arch/x86/mm/tlb.c~remove-invpcid-single arch/x86/mm/tlb.c
> --- a/arch/x86/mm/tlb.c~remove-invpcid-single 2023-07-14 08:29:08.665225945 -0700
> +++ b/arch/x86/mm/tlb.c 2023-07-14 08:29:08.673225955 -0700
> @@ -1141,20 +1141,24 @@ void flush_tlb_one_kernel(unsigned long
> STATIC_NOPV void native_flush_tlb_one_user(unsigned long addr)
> {
> u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
> + bool cpu_pcide = this_cpu_read(cpu_tlbstate.cr4) & X86_CR4_PCIDE;
>
> + /* Flush 'addr' from the kernel PCID: */
> asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
>
> + /* If PTI is off there is no user PCID and nothing to flush. */
> if (!static_cpu_has(X86_FEATURE_PTI))
> return;

As a minor observation, the common case is for the function to exit
here, but you've got both this_cpu_read()'s ahead of a full compiler
memory barrier.

If you move them here, you'll drop the reads from the common case.  But...

>
> /*
> - * Some platforms #GP if we call invpcid(type=1/2) before CR4.PCIDE=1.
> - * Just use invalidate_user_asid() in case we are called early.
> + * invpcid_flush_one(pcid>0) will #GP if CR4.PCIDE==0. Check
> + * 'cpu_pcide' to ensure that *this* CPU will not trigger those
> + * #GP's even if called before CR4.PCIDE has been initialized.
> */
> - if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE))
> - invalidate_user_asid(loaded_mm_asid);
> - else
> + if (boot_cpu_has(X86_FEATURE_INVPCID) && cpu_pcide)

... why can't this just be && loaded_mm_asid ?

There's no plausible way the asid can be nonzero here without CR4.PCIDE
being set, and that avoids looking at cr4 directly.

~Andrew