Re: [PATCH 5/5] mm, shmem: Show location of non-resident shmem pages in smaps

From: Jerome Marchand
Date: Fri Aug 01 2014 - 11:24:47 EST


On 08/01/2014 07:06 AM, Hugh Dickins wrote:
> On Tue, 22 Jul 2014, Jerome Marchand wrote:
>
>> Adds ShmOther, ShmOrphan, ShmSwapCache and ShmSwap lines to
>> /proc/<pid>/smaps for shmem mappings.
>>
>> ShmOther: amount of memory that is currently resident in memory, not
>> present in the page table of this process but present in the page
>> table of an other process.
>> ShmOrphan: amount of memory that is currently resident in memory but
>> not present in any process page table. This can happens when a process
>> unmaps a shared mapping it has accessed before or exits. Despite being
>> resident, this memory is not currently accounted to any process.
>> ShmSwapcache: amount of memory currently in swap cache
>> ShmSwap: amount of memory that is paged out on disk.
>>
>> Signed-off-by: Jerome Marchand <jmarchan@xxxxxxxxxx>
>
> You will have to do a much better job of persuading me that these
> numbers are of any interest. Okay, maybe not me, I'm not that keen
> on /proc/<pid>/smaps at the best of times. But you will need to show
> plausible cases where having these numbers available would have made
> a real difference, and drum up support for their inclusion from
> /proc/<pid>/smaps devotees.
>
> Do you have a customer, who has underprovisioned with swap,
> and wants these numbers to work out how much more is needed?

We have a customer who needs to know how much memory a process with big
shared anonymous mappings have in swap.

>
> As it is, they appear to be numbers that you found you could provide,
> and so you're adding them into /proc/<pid>/smaps, but having great
> difficulty in finding good names to describe them - which is itself
> an indicator that they're probably not the most useful statistics
> a sysadmin is wanting.

ShmSwap is obviously the stat I needed for our customer. I also have use
for the ill named ShmOrphan (see below). I may have add the two others
because there were low hanging fruits, or maybe because there were
useful to me for debugging. I will get rid of them.

>
> (Google is a /proc/<pid>/smaps user: let's take a look to see if
> we have been driven to add in stats of this kind: no, not at all.)
>
> The more numbers we add to /proc/<pid>/smaps, the longer it will take to
> print, the longer mmap_sem will be held, and the more it will interfere
> with proper system operation - that's the concern I more often see.
>
>> ---
>> Documentation/filesystems/proc.txt | 11 ++++++++
>> fs/proc/task_mmu.c | 56 +++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 66 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>> index 1a15c56..a65ab59 100644
>> --- a/Documentation/filesystems/proc.txt
>> +++ b/Documentation/filesystems/proc.txt
>> @@ -422,6 +422,10 @@ Swap: 0 kB
>> KernelPageSize: 4 kB
>> MMUPageSize: 4 kB
>> Locked: 374 kB
>> +ShmOther: 124 kB
>> +ShmOrphan: 0 kB
>> +ShmSwapCache: 12 kB
>> +ShmSwap: 36 kB
>> VmFlags: rd ex mr mw me de
>>
>> the first of these lines shows the same information as is displayed for the
>> @@ -437,6 +441,13 @@ a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
>> and a page is modified, the file page is replaced by a private anonymous copy.
>> "Swap" shows how much would-be-anonymous memory is also used, but out on
>> swap.
>> +The ShmXXX lines only appears for shmem mapping. They show the amount of memory
>> +from the mapping that is currently:
>> + - resident in RAM, not present in the page table of this process but present
>> + in the page table of an other process (ShmOther)
>
> We don't show that for files of any other filesystem, why for shmem?
> Perhaps you are too focussed on SysV SHM, and I am too focussed on tmpfs.

I must admit that I see all this from SysV SHM / shared anon mappings
point of view.

>
> It is a very specialized statistic, and therefore hard to name: I don't
> think ShmOther is a good name, but doubt any would do. ShmOtherMapped?
>
>> + - resident in RAM but not present in the page table of any process (ShmOrphan)
>
> We don't show that for files of any other filesystem, why for shmem?

Because these pages can not be discarded of write back to disk. Under
memory pressure, they need space on swap or have to stay in RAM.

>
> Orphan? We do use the word "orphan" to describe pages which have been
> truncated off a file, but somehow not yet removed from pagecache.

I was unaware of that.

> We
> don't use the the word "orphan" to describe pagecache pages which are
> not mapped into userspace - they are known as "pagecache pages which
> are not mapped into userspace". ShmNotMapped?

I'm not sure about the terminology here. These pages are not mapped in
the sense that their map_count is zero, but they belong to a userspace
mapping.

>
>> + - in swap cache (ShmSwapCache)
>
> Is this interesting? It's a transitional state: either memory pressure
> has forced the page to swapcache, but not yet freed it from memory; or
> swapin_readahead has brought this page back in when bringing in a nearby
> page of swap.
>
> I can understand that we might want better stats on the behaviour of
> swapin_readahead; better stats on shmem objects and swap; better stats
> on duplication between pagecache and swap; but I'm not convinced that
> /proc/<pid>/smaps is the right place for those.
>
> Against all that, of course, we do have mincore() showing these pages
> as incore, where /proc/<pid>/smaps does not. But I think that is
> justified by mincore()'s mission to show what's incore.
>
>> + - paged out on swap (ShmSwap).
>
> This one has the best case for inclusion: we do show Swap for the anon
> pages which are out on swap, but not for the shmem areas, where swap
> entry does not go into page table. But there is good reason for that:
> this is shared memory, files, objects commonly shared between
> processes, so it's a poor fit then to account them by processes.
>
> (We have "df" and "du" showing the occupancy of mounted tmpfs
> filesystems: it would be nice if we had something like those,
> which showed also the swap occupancy, and for the non-user-mounts.)

I guess that works for tmpfs, but shared anon mappings are invisible to
these tools.

Jerome

>
> I need much more convincing on this patch: I expect you will drop
> some of the numbers, and provide an argument for others.
>
> Hugh
>


Attachment: signature.asc
Description: OpenPGP digital signature