Re: x86: pgtable / kaslr initialisation (OOB) help

From: Dave Hansen
Date: Wed Jun 14 2023 - 12:02:06 EST


On 6/14/23 08:26, Lee Jones wrote:
> On Wed, 14 Jun 2023, Lee Jones wrote:
>
>> On Wed, 14 Jun 2023, Lee Jones wrote:
>>
>>> Thanks for chiming in Dave. I hoped you would.
>>>
>>> On Wed, 14 Jun 2023, Dave Hansen wrote:
>>>
>>>> On 6/14/23 07:37, Lee Jones wrote:
>>>>> Still unsure how we (the kernel) can/should write to an area of memory
>>>>> that does not belong to it. Should we allocate enough memory
>>>>> (2*PAGE_SIZE? rather than 8-Bytes) for trampoline_pgd_entry to consume
>>>>> in a more sane way?
>>>>
>>>> No.
>>>>
>>>> I think this:
>>>>
>>>> set_pgd(&trampoline_pgd_entry,
>>>> __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)));
>>>>
>>>> is bogus-ish. set_pgd() wants to operate on a pgd_t inside a pgd
>>>> *PAGE*. But it's just being pointed at a single _entry_. The address
>>>> of 'trampoline_pgd_entry' in your case also just (unfortunately)
>>>> happens to pass the:
>>>>
>>>> __pti_set_user_pgtbl -> pgdp_maps_userspace()
>>>>
>>>> test. I _think_ we want these to just be something like:
>>>>
>>>> trampoline_pgd_entry = __pgd(_KERNPG_TABLE |
>>>> __pa(p4d_page_tramp);
>>>>
>>>> That'll keep us away from all of the set_pgd()-induced nastiness.
>>>
>>> Okay. Is this what you're suggesting?
>>>
>>> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c v
>>> index d336bb0cb38b..803595c7dcc8 100644
>>> --- a/arch/x86/mm/kaslr.c
>>> +++ b/arch/x86/mm/kaslr.c
>>> @@ -176,7 +176,7 @@ void __meminit init_trampoline_kaslr(void)
>>> set_pgd(&trampoline_pgd_entry,
>>> __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)));
>>> } else {
>>> - set_pgd(&trampoline_pgd_entry,
>>> - __pgd(_KERNPG_TABLE | __pa(pud_page_tramp)));
>>> + trampoline_pgd_entry =
>>> + __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp);
>>
>> Note the change of *.page_tramp here.
>>
>> s/pud/p4d/
>>
>> I'm assuming that too was intentional?
>
> Never mind. I can see that p4d_page_tramp is local to the if() segment.
>
> While we're at it, does the if() segment look correct to you:
>
> if (pgtable_l5_enabled()) {
> p4d_page_tramp = alloc_low_page();
>
> p4d_tramp = p4d_page_tramp + p4d_index(paddr);
>
> set_p4d(p4d_tramp,
> __p4d(_KERNPG_TABLE | __pa(pud_page_tramp)));
>
> set_pgd(&trampoline_pgd_entry,
> __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)));
> } else {
> trampoline_pgd_entry =
> __pgd(_KERNPG_TABLE | __pa(pud_page_tramp));
> }
>
> - pud_page_tramp is being passed to set_p4d()
> - p4d_page_tramp is being passed to set_pgd()
>
> Should those be the other way around, or am I missing the point?

You're missing the point. :)

PGDs are always set up to point to the physical address of the thing at
one lower level than them. A page is allocated for that level when
5-level paging is in play. No page is needed when it is not in play.

The pattern is _almost_ always

pgd = ... __pa(p4d);

In other words, point the PGD at the physical address of a p4d. But
things get funky on systems without p4ds, thus the special casing here.

Does the (completely untested) attached patch fix your problem?

---

b/arch/x86/mm/kaslr.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff -puN arch/x86/mm/kaslr.c~trampoline_pgd_entry arch/x86/mm/kaslr.c
--- a/arch/x86/mm/kaslr.c~trampoline_pgd_entry 2023-06-14 08:54:08.685554094 -0700
+++ b/arch/x86/mm/kaslr.c 2023-06-14 08:55:36.077089793 -0700
@@ -172,10 +172,10 @@ void __meminit init_trampoline_kaslr(voi
set_p4d(p4d_tramp,
__p4d(_KERNPG_TABLE | __pa(pud_page_tramp)));

- set_pgd(&trampoline_pgd_entry,
- __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)));
+ trampoline_pgd_entry =
+ __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp));
} else {
- set_pgd(&trampoline_pgd_entry,
- __pgd(_KERNPG_TABLE | __pa(pud_page_tramp)));
+ trampoline_pgd_entry =
+ __pgd(_KERNPG_TABLE | __pa(pud_page_tramp));
}
}
_