Re: [PATCH] x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3()

From: Borislav Petkov
Date: Tue Jun 13 2017 - 05:27:18 EST


On Mon, Jun 12, 2017 at 10:26:14AM -0700, Andy Lutomirski wrote:
> The kernel has several code paths that read CR3. Most of them assume that
> CR3 contains the PGD's physical address, whereas some of them awkwardly
> use PHYSICAL_PAGE_MASK to mask off low bits.
>
> Add explicit mask macros for CR3 and convert all of the CR3 readers.
> This will keep them from breaking when PCID is enabled.

...

> +/*
> + * CR3's layout varies depending on several things.
> + *
> + * If CR4.PCIDE is set (64-bit only), then CR3[11:0] is the address space ID.
> + * If PAE is enabled, then CR3[11:5] is part of the PDPT address
> + * (i.e. it's 32-byte aligned, not page-aligned) and CR3[4:0] is ignored.
> + * Otherwise (non-PAE, non-PCID), CR3[3] is PWT, CR3[4] is PCD, and
> + * CR3[2:0] and CR3[11:5] are ignored.
> + *
> + * In all cases, Linux puts zeros in the low ignored bits and in PWT and PCD.
> + *
> + * CR3[63] is always read as zero. If CR4.PCIDE is set, then CR3[63] may be
> + * written as 1 to prevent the write to CR3 from flushing the TLB.
> + *
> + * On systems with SME, one bit (in a variable position!) is stolen to indicate
> + * that the top-level paging structure is encrypted.
> + *
> + * All of the remaining bits indicate the physical address of the top-level
> + * paging structure.
> + *
> + * CR3_ADDR_MASK is the mask used by read_cr3_pa().
> + */
> +#ifdef CONFIG_X86_64
> +/* Mask off the address space ID bits. */
> +#define CR3_ADDR_MASK 0x7FFFFFFFFFFFF000ull
> +#define CR3_PCID_MASK 0xFFFull
> +#else
> +/*
> + * CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save
> + * a tiny bit of code size by setting all the bits.
> + */
> +#define CR3_ADDR_MASK 0xFFFFFFFFull
> +#define CR3_PCID_MASK 0ull

All those can do GENMASK_ULL for better readability.

> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 3586996fc50d..bc0a849589bb 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -391,7 +391,7 @@ struct pv_mmu_ops pv_mmu_ops __ro_after_init = {
>
> .read_cr2 = native_read_cr2,
> .write_cr2 = native_write_cr2,
> - .read_cr3 = native_read_cr3,
> + .read_cr3 = __native_read_cr3,
> .write_cr3 = native_write_cr3,
>
> .flush_tlb_user = native_flush_tlb,
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index ffeae818aa7a..c6d6dc5f8bb2 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -92,7 +92,7 @@ void __show_regs(struct pt_regs *regs, int all)
>
> cr0 = read_cr0();
> cr2 = read_cr2();
> - cr3 = read_cr3();
> + cr3 = __read_cr3();
> cr4 = __read_cr4();

This is a good example for my confusion. So we have __read_cr4() - with
the underscores - but not read_cr4().

Now we get __read_cr3 *and* read_cr3_pa(). So __read_cr3() could just as
well lose the "__", right?

Or are the PCID series bringing a read_cr3() without "__" too?

Oh, and to confuse me even more, there's __native_read_cr3() which is
*finally* accessing %cr3 :-) But there's native_write_cr3() without
underscores.

So can we make those names a bit more balanced please?

Thanks.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.