Re: [tip:efi/core] x86/mm/pat: Use _PAGE_GLOBAL bit for EFI page table mappings

From: Andy Lutomirski
Date: Tue Feb 23 2016 - 21:43:28 EST


On Tue, Feb 23, 2016 at 4:50 PM, Sai Praneeth Prakhya
<sai.praneeth.prakhya@xxxxxxxxx> wrote:
> On Tue, 2016-02-23 at 09:47 -0800, Andy Lutomirski wrote:
>> On Feb 23, 2016 1:09 AM, <&quot;tip-bot for Sai Praneeth
>> &lt;tipbot@xxxxxxxxx&gt;&quot;@zytor.com> wrote:
>>
>> Something's wrong with tip-bot. This should say:
>>
>>
>> commit 397630150632639b3ca5b4414accd5011c45e276
>> Author: Sai Praneeth <sai.praneeth.prakhya@xxxxxxxxx>
>> Date: Wed Feb 17 12:35:56 2016 +0000
>>
>> x86/mm/pat: Use _PAGE_GLOBAL bit for EFI page table mappings
>>
>> Since EFI page tables can be treated as kernel page tables they should
>> be global. All the other page mapping functions in pageattr.c set the
>> _PAGE_GLOBAL bit and we want to avoid inconsistencies when we map a page
>> in the EFI code paths, for example when that page is split in
>> __split_large_page(), etc. It also makes it easier to validate that the
>> EFI region mappings have the correct attributes because there are fewer
>> differences compared with regular kernel mappings.
>>
>> But the actual patch is:
>>
>>
>> @@ -909,6 +909,20 @@ static void populate_pte(struct cpa_data *cpa,
>>
>> pte = pte_offset_kernel(pmd, start);
>>
>> + /*
>> + * Set the GLOBAL flags only if the PRESENT flag is
>> + * set otherwise pte_present will return true even on
>> + * a non present pte. The canon_pgprot will clear
>> + * _PAGE_GLOBAL for the ancient hardware that doesn't
>> + * support it.
>> + */
>> + if (pgprot_val(pgprot) & _PAGE_PRESENT)
>> + pgprot_val(pgprot) |= _PAGE_GLOBAL;
>> + else
>> + pgprot_val(pgprot) &= ~_PAGE_GLOBAL;
>> +
>> + pgprot = canon_pgprot(pgprot);
>> +
>>
>> The comment is confusing. This code is setting GLOBAL if PRESENT is
>> set even if not requested, but the comment is about setting GLOBAL
>> *only* if PRESENT is set.
>
> As you rightly said the code is about making a page GLOBAL if PRESENT is
> set and we do set PRESENT bit before mapping so that page is GLOBAL.
> This code was taken from the other parts of pageattr.c. The point is
> that we don't want differences between whether things were mapped in the
> EFI page tables directly (i.e. using populate_pte()) or later split from
> large pages via the split_large_page() code path. If this is still
> confusing could you please elaborate on it further.

At least the comment should say "Set the GLOBAL flag if and only
if...". But why is this code here in the first place? What is
passing a pgprot with global unset into this code in the first place?

>
>>
>> Can you explain:
>>
>> a) Why this wasn't already broken. (were there no callers who set
>> GLOBAL but not PRESENT? If there weren't any, why is that part
>> needed?)
>>
>
> I don't think previous implementation is broken and this is not a bug
> fix as such. Before this patch some EFI region mappings had GLOBAL bit
> set (which followed split large page path) and some aren't (which used
> populate_pte). This patch just aligns the mappings done via two
> different code paths as mentioned above. As a whole it also maintains
> consistency with kernel mappings.

But your code also *clears* GLOBAL if PRESENT is clear, and your
comment talks about that. Does this ever actually happen in practice?

I'd much rather see WARN_ON_ONCE((pgprot_val(pgprot) & (_PAGE_PRESENT
| _PAGE_GLOBAL)) == _PAGE_GLOBAL) in here, if that makes sense in the
context of the callers of the function.

>
>> b) Why setting GLOBAL for EFI mappings is useful.
>
> We did this for consistency among EFI mappings. This has some advantages
> as mentioned in the commit message. It also makes it less confusing when
> starting at the PGT_DUMP traces if _PAGE_GLOBAL is used consistently.
>
> We don't actually do anything special with _PAGE_GLOBAL in EFI.

You're making a choice of whether to set _PAGE_GLOBAL, and I think
you've made the wrong choice.

Normally, the only pages with are _PAGE_GLOBAL are those that are in
the normal kernel mappings (swapper_pg_dir and normal mm_struct pgds).
By allowing _PAGE_GLOBAL to be set in EFI mappings, you're breaking
that convention, which forces you to use extra-expensive
__flush_tlb_all calls in efi_call_virt.

I think you should explicitly *clear* _PAGE_GLOBAL in the EFI mappings
instead. This would allow you to use write_cr3 by itself, which would
make the code simpler and faster.

>
>> c) Why setting GLOBAL for EFI mappings is safe. Don't we unmap the
>> EFI mappings when we're not actively using them in new kernels? If
>> so, don't we explicitly want them *not* to be GLOBAL to avoid needing
>> an extra-expensive global flush?
>
> This is a valid point. I know that EFI runtime regions persist during
> and after boot if we have a UEFI firmware and other commits made EFI
> regions have separate page table but I am not clear about the effect of
> global flush. I think Matt/Boris could comment on it.

It's straightfoward on existing kernels. If _PAGE_GLOBAL is set, TLB
entries persist across cr3 writes. If _PAGE_GLOBAL is clear, then TLB
entries are flushed by cr3 writes.

With PCID enabled (which is only in a not-quite-ready patch set I
have), the story is a bit more complicated, but it works essentially
the same way unless you explicitly opt out.

>
>> d) Why this doesn't break any non-EFI code.
>
> We touch this code path only when mapping EFI runtime regions to VA
> space, i.e. we added pgd field in cpa only as a support for mapping efi
> runtime regions.

populate_pgd is called from non-EFI code as well though, isn't it?

--Andy