Re: [PATCH v4] mm: Add PM_HUGE_THP_MAPPING to /proc/pid/pagemap

From: Mina Almasry
Date: Mon Nov 15 2021 - 19:16:51 EST


On Thu, Nov 11, 2021 at 11:41 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Wed, Nov 10, 2021 at 09:42:25AM -0800, Mina Almasry wrote:
> > Sorry, yes I should update the commit message with this info. The
> > issues with smaps are:
> > 1. Performance: I've pinged our network service folks to obtain a
> > rough perf comparison but I haven't been able to get one. I can try to
> > get a performance measurement myself but Peter seems to be also seeing
> > this.
>
> No I was not seeing any real issues in my environment, but I remembered people
> complaining about it because smaps needs to walk the whole memory of the
> process, then if one program is only interested in some small portion of the
> whole memory, it'll be slow because smaps will still need to walk all the
> memory anyway.
>
> > 2. smaps output is human readable and a bit convoluted for userspace to parse.
>
> IMHO this is not a major issue. AFAIK lots of programs will still try to parse
> human readable output like smaps to get some solid numbers. It's just that
> it'll be indeed an perf issue if it's only a part of the memory that is of
> interest.
>
> Could we consider exporting a new smaps interface that:
>
> 1. allows to specify a range of memory, and,
> 2. expose information as "struct mem_size_stats" in binary format
> (we may want to replace "unsigned long" with "u64", then also have some
> versioning or having a "size" field for the struct, though; seems doable)
>
> I'm wondering whether this could be helpful in even more scenarios.
>

Sorry could you elaborate more? How do we allow users to specify the
range? Are you envisioning a new added syscall? Or is there some way
for users to pass the range to the existing /proc/pid/smaps that I'm
missing?

On Thu, Nov 11, 2021 at 11:43 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Wed, Nov 10, 2021 at 09:50:13AM -0800, Mina Almasry wrote:
> > On Tue, Nov 9, 2021 at 11:03 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
> > >
> > > The ending "_MAPPING" seems redundant to me, how about just call it "PM_THP" or
> > > "PM_HUGE" (as THP also means HUGE already)?
> > >
> >
> > So I want to make it clear that the flag is set only when the page is
> > PMD mappend and is a THP (not hugetlbfs or some other PMD device
> > mapping). PM_THP would imply the flag is set only if the underlying
> > page is THP without regard to whether it's actually PMD mapped or not.
>
> I see, that's fine.
>
> However as I mentioned I still think HUGE and THP dup with each other.
> Meanwhile, "MAPPING" does not sound like a boolean status on whether it's thp
> mapped..
>
> If you still prefer this approach, how about PM_THP_MAPPED?

PM_THP_MAPPED sounds good to me.

TBH I think I still prefer this approach because it's a very simple 2
line patch which addresses the concrete use case I have well. I'm not
too familiar with the smaps code to be honest but I think adding a
range-based smaps API will be a sizeable patch to add a syscall,
handle a stable interface, and handle cases where the memory range
doesn't match a VMA boundary. I'm not sure the performance benefit
would justify this patch and I'm not sure the extra info from smaps
would be widely useful. However if you insist and folks believe this
is the better approach I can prototype a range-based smaps and test
its performance to see if it works for us as well, just let me know
what kind of API you're envisioning.

>
> --
> Peter Xu
>