Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

From: Andy Lutomirski
Date: Mon Dec 18 2017 - 17:12:20 EST


On Mon, Dec 18, 2017 at 12:34 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> On 12/18/2017 03:42 AM, Thomas Gleixner wrote:
>> --- a/arch/x86/include/asm/pgtable.h
>> +++ b/arch/x86/include/asm/pgtable.h
>> @@ -1120,6 +1120,11 @@ static inline void pmdp_set_wrprotect(st
>> static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
>> {
>> memcpy(dst, src, count * sizeof(pgd_t));
>> +#ifdef CONFIG_PAGE_TABLE_ISOLATION
>> + /* Clone the user space pgd as well */
>> + memcpy(kernel_to_user_pgdp(dst), kernel_to_user_pgdp(src),
>> + count * sizeof(pgd_t));
>> +#endif
>> }
>
> I was just thinking about this as I re-write the documentation about
> where the overhead of pti comes from.
>
> This obviously *works* for now. But, we certainly have the pti-mapped
> stuff spread much less through the address space than when this was
> thrown in here. It *seems* like we could probably do this with just 4 PGDs:
>
>> pti_clone_user_shared();
>> pti_clone_entry_text();
>> pti_setup_espfix64();
>> pti_setup_vsyscall();
>
> The vsyscall is just one page and the espfix is *sized* to be one PGD,
> so we know each of those only takes one entry.
>
> We surely don't have 512GB of entry_text, and I don't think KASLR can
> ever cause it to span two PGD entries.

This would definitely work and, long-term, I think we should get rid
of the entry text mapping entirely. The tricky bit is that we need to
rearrange the whole memory map fairly radically for this.

We could make it more compact, too: the vsyscall page and the
cpu_entry_area stuff can share a PGD. The LDT could go in there, too.
The only requirement the LDT PGD has is that all of the next-level
entries that will ever be allocated are allocated at boot time.