Re: [RFC PATCH v2 12/47] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step

From: James Houghton
Date: Wed Jan 04 2023 - 20:23:39 EST


On Thu, Jan 5, 2023 at 12:58 AM Jane Chu <jane.chu@xxxxxxxxxx> wrote:
>
> > + * @stop_at_none determines what we do when we encounter an empty PTE. If true,
> > + * we return that PTE. If false and @sz is less than the current PTE's size,
> > + * we make that PTE point to the next level down, going until @sz is the same
> > + * as our current PTE.
> [..]
> > +int hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma,
> > + struct hugetlb_pte *hpte, unsigned long addr,
> > + unsigned long sz, bool stop_at_none)
> > +{
> [..]
> > + while (hugetlb_pte_size(hpte) > sz && !ret) {
> > + pte = huge_ptep_get(hpte->ptep);
> > + if (!pte_present(pte)) {
> > + if (stop_at_none)
> > + return 0;
> > + if (unlikely(!huge_pte_none(pte)))
> > + return -EEXIST;
>
> If 'stop_at_none' means settling down on the just encountered empty PTE,
> should the above two "if" clauses switch order? I thought Peter has
> raised this question too, but I'm not seeing a response.

A better name for "stop_at_none" would be "dont_allocate"; it will be
changed in the next version. The idea is that "stop_at_none" would
simply do a walk, and the caller will deal with what it finds. If we
can't continue the walk for any reason, just return 0. So in this
case, if we land on a non-present, non-none PTE, we can't continue the
walk, so just return 0.

Another way to justify this order: we want to ensure that calls to
this function with stop_at_none=1 and sz=PAGE_SIZE will never fail,
and that gives us the order that you see. (This requirement is
documented in the comment above the definition of hugetlb_hgm_walk().
This guarantee makes it easier to write code that uses HGM walks.)

> Also here below, the way 'stop_at_none' is used when HGM isn't enabled
> is puzzling. Could you elaborate please?
>
> > + if (!hugetlb_hgm_enabled(vma)) {
> > + if (stop_at_none)
> > + return 0;
> > + return sz == huge_page_size(hstate_vma(vma)) ? 0 : -EINVAL;
> > + }

This is for the same reason; if "stop_at_none" is provided, we need to
guarantee that this function won't fail. If "stop_at_none" is false
and sz != huge_page_size(), then the caller is attempting to use HGM
without having enabled it, hence -EINVAL.

Both of these bits will be cleaned up with the next version of this series. :)

Thanks!

- James