Re: [PATCH v2] mm/hugetlb: fix a addressing exception caused by huge_pte_offset()

From: Mike Kravetz
Date: Tue Mar 24 2020 - 12:20:15 EST


On 3/24/20 8:55 AM, Jason Gunthorpe wrote:
> On Tue, Mar 24, 2020 at 08:25:09AM -0700, Mike Kravetz wrote:
>> On 3/24/20 4:55 AM, Jason Gunthorpe wrote:
>>> Also, since CH moved all the get_user_pages_fast code out of the
>>> arch's many/all archs can drop their arch specific version of this
>>> routine. This is really just a specialized version of gup_fast's
>>> algorithm..
>>>
>>> (also the arch versions seem different, why do some return actual
>>> ptes, not null?)
>>
>> Not sure I understand that last question. The return value should be
>> a *pte or null.
>
> I mean the common code ends like this:
>
> pmd = pmd_offset(pud, addr);
> if (sz != PMD_SIZE && pmd_none(*pmd))
> return NULL;
> /* hugepage or swap? */
> if (pmd_huge(*pmd) || !pmd_present(*pmd))
> return (pte_t *)pmd;
>
> return NULL;
>
> So it always returns a pointer into a PUD or PMD, while say, ppc
> in __find_linux_pte() ends like:
>
> return pte_offset_kernel(&pmd, ea);
>
> Which is pointing to a PTE

Ok, now I understand the question. huge_pte_offset will/should only be
called for addresses that are in a vma backed by hugetlb pages. So,
pte_offset_kernel() will only return page table type (PUD/PMD/etc) associated
with a huge page supported by the particular arch.

> So does sparc:
>
> pmd = pmd_offset(pud, addr);
> if (pmd_none(*pmd))
> return NULL;
> if (is_hugetlb_pmd(*pmd))
> return (pte_t *)pmd;
> return pte_offset_map(pmd, addr);
>
> Which is even worse because it is leaking a kmap..
>
> etc
>
>> /*
>> * huge_pte_offset() - Walk the page table to resolve the hugepage
>> * entry at address @addr
>> *
>> * Return: Pointer to page table or swap entry (PUD or PMD) for
> ^^^^^^^^^^^^^^^^^^^
>
> Ie the above is not followed by the archs
>
> I'm also scratching my head that a function that returns a pte_t *
> always returns a PUD or PMD. Strange bit of type casting..

Yes, the casting is curious. The casting continues in potential subsequent
calls to huge_pte_alloc().
--
Mike Kravetz