Re: [PATCH] smaps: Fix the abnormal memory statistics obtained through /proc/pid/smaps

From: Peter Xu
Date: Thu Jul 27 2023 - 16:31:24 EST


On Thu, Jul 27, 2023 at 09:17:45PM +0200, David Hildenbrand wrote:
> On 27.07.23 20:59, Peter Xu wrote:
> > On Thu, Jul 27, 2023 at 07:27:02PM +0200, David Hildenbrand wrote:
> > > > >
> > > > > This was wrong from the very start. If we're not in GUP, we shouldn't call
> > > > > GUP functions.
> > > >
> > > > My understanding is !GET && !PIN is also called gup.. otherwise we don't
> > > > need GET and it can just be always implied.
> > >
> > > That's not the point. The point is that _arbitrary_ code shouldn't call into
> > > GUP internal helper functions, where they bypass, for example, any sanity
> > > checks.
> >
> > What's the sanity checks that you're referring to?
> >
>
> For example in follow_page()
>
> if (vma_is_secretmem(vma))
> return NULL;
>
> if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
> return NULL;
>
>
> Maybe you can elaborate why you think we should *not* be using
> vm_normal_page_pmd() and instead some arbitrary GUP internal helper? I don't
> get it.

Because the old code was written like that?

You're proposing to change it here. Again, I'm fine with the change, but
please don't ask me to justify why the original code is fine.. because I
simply don't see anything majorly wrong with either, it's the change that
needs justification, not keeping it as-is (since Kirill wrote it in 2014).

Well.. I feel like this becomes less helpful to discuss. let's try to move
on.

>
> > >
> > > >
> > > > The other proof is try_grab_page() doesn't fail hard on !GET && !PIN. So I
> > > > don't know whether that's "wrong" to be used..
> > > >
> > >
> > > To me, that is arbitrary code using a GUP internal helper and, therefore,
> > > wrong.
> > >
> > > > Back to the topic: I'd say either of the patches look good to solve the
> > > > problem. If p2pdma pages are mapped as PFNMAP/MIXEDMAP (?), I guess
> > > > vm_normal_page_pmd() proposed here will also work on it, so nothing I see
> > > > wrong on 2nd one yet.
> > > >
> > > > It looks nicer indeed to not have FOLL_FORCE here, but it also makes me
> > > > just wonder whether we should document NUMA behavior for FOLL_* somewhere,
> > > > because we have an implication right now on !FOLL_FORCE over NUMA, which is
> > > > not obvious to me..
> > >
> > > Yes, we probably should. For get_use_pages() and friends that behavior was
> > > always like that and it makes sense: usually it represent application
> > > behavior.
> > >
> > > >
> > > > And to look more over that aspect, see follow_page(): previously we can
> > > > follow a page for protnone (as it never applies FOLL_NUMA) but now it won't
> > > > (it never applies FOLL_FORCE, either, so it seems "accidentally" implies
> > > > FOLL_NUMA now). Not sure whether it's intended, though..
> > >
> > > That was certainly an oversight, thanks for spotting that. That patch was
> > > not supposed to change semantics:
> > >
> > > diff --git a/mm/gup.c b/mm/gup.c
> > > index 76d222ccc3ff..ac926e19ff72 100644
> > > --- a/mm/gup.c
> > > +++ b/mm/gup.c
> > > @@ -851,6 +851,13 @@ struct page *follow_page(struct vm_area_struct *vma,
> > > unsigned long address,
> > > if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
> > > return NULL;
> > >
> > > + /*
> > > + * In contrast to get_user_pages() and friends, we don't want to
> > > + * fail if the PTE is PROT_NONE: see gup_can_follow_protnone().
> > > + */
> > > + if (!(foll_flags & FOLL_WRITE))
> > > + foll_flags |= FOLL_FORCE;
> > > +
> > > page = follow_page_mask(vma, address, foll_flags, &ctx);
> > > if (ctx.pgmap)
> > > put_dev_pagemap(ctx.pgmap);
> >
> > This seems to be slightly against your other solution though for smaps,
> > where we want to avoid abusing FOLL_FORCE.. isn't it..
>
> This is GUP internal, not some arbitrary code, so to me a *completely*
> different discussion.
>
> >
> > Why read only? That'll always attach FOLL_FORCE to all follow page call
> > sites indeed for now, but just curious - logically "I want to fetch the
> > page even if protnone" is orthogonal to do with write permission here to
> > me.
>
> Historical these were not the semantics, so I won't change them.
>
> FOLL_FORCE | FOLL_WRITE always had a special taste to it (COW ...).
>
> >
> > I still worry about further abuse of FOLL_FORCE, I believe you also worry
> > that so you proposed the other way for the smaps issue.
> >
> > Do you think we can just revive FOLL_NUMA? That'll be very clear to me
> > from that aspect that we do still have valid use cases for it.
>
> FOLL_NUMA naming was nowadays wrong to begin with (not to mention, confusing
> a we learned). There are other reasons why we have PROT_NONE -- mprotect(),
> for example.

It doesn't really violate with the name, IMHO - protnone can be either numa
hint or PROT_NONE for real. As long as we return NULL for a FOLL_NUMA
request we're achieving the goal we want - we guarantee a NUMA balancing to
trigger with when FOLL_NUMA provided. It doesn't need to guarantee
anything else, afaiu. The final check relies in vma_is_accessible() in the
fault paths anyway. So I don't blame the old name that much.

>
> We could have a flag that goes the other way around: FOLL_IGNORE_PROTNONE
> ... which surprisingly then ends up being exactly what FOLL_FORCE means
> without FOLL_WRITE, and what this patch does.
>
> Does that make sense to you?
>
>
> >
> > The very least is if with above we should really document FOLL_FORCE - we
> > should mention NUMA effects. But that's ... really confusing. Thinking
> > about that I personally prefer a revival of FOLL_NUMA, then smaps issue all
> > go away.
>
> smaps needs to be changed in any case IMHO. And I'm absolutely not in favor
> of revicing FOLL_NUMA.

As stated above, to me FOLL_NUMA is all fine and clear. If you think
having a flag back for protnone is worthwhile no matter as-is (FOLL_NUMA)
or with reverted meaning, then that sounds all fine to me. Maybe the old
name at least makes old developers know what's that.

I don't have a strong opinion on names though; mostly never had.

Thanks,

--
Peter Xu