Re: [PATCH] A new entry for /proc

From: Hugh Dickins
Date: Sat Jan 08 2005 - 15:23:52 EST


On Thu, 6 Jan 2005, Andrew Morton wrote:
> Mauricio Lin <mauriciolin@xxxxxxxxx> wrote:
> >
> > Here is a new entry developed for /proc that prints for each process
> > memory area (VMA) the size of rss. The maps from original kernel is
> > able to present the virtual size for each vma, but not the physical
> > size (rss). This entry can provide an additional information for tools
> > that analyze the memory consumption. You can know the physical memory
> > size of each library used by a process and also the executable file.
> >
> > Take a look the output:
> > # cat /proc/877/smaps
> > 08048000-08132000 r-xp /usr/bin/xmms
> > Size: 936 kB
> > Rss: 788 kB
>
> This is potentially quite useful. I'd be interested in what others think of
> the idea and implementation.

Regarding the idea.

Well, it goes back to just what wli freed 2.6 from, and what we scorned
clameter for: a costly examination of every pte-slot of every vma of the
process. That doesn't matter _too_ much so long as there's no standard
tool liable to do it every second or so, nor doing it to every single
process, and it doesn't need spinlock or preemption disabled too long.

But personally I'd be happier for it to remain an out-of-tree patch,
just to discourage people from writing and running such tools,
and to discourage them from adding other such costly analyses.

Potentially quite useful, perhaps. But I don't have a use for it
myself, and if I do have, I'll be content to search out (or recreate)
the patch. Let's hear from those who actually have a use for it now -
the more useful it is, of course, the stronger the argument for inclusion.

I am a bit sceptical how useful such a lot of little numbers would
really be - usually it's an overall picture we're interested in.

Regarding the implementation.

Unnecessarily inefficient: a pte_offset_map and unmap for each pte.
Better go back to the 2.4.28 or 2.5.36 fs/proc/array.c design for
statm_pgd_range + statm_pmd_range + statm_pte_range - but now you
need a pud level too.

Seems to have no locking: needs to down_read mmap_sem to guard vmas.
Does it need page_table_lock? I think not (and proc_pid_statm didn't).

If there were a use for it, that use might want to distinguish between
the "shared rss" of pagecache pages from a file, and the "anon rss" of
private pages copied from file or originally zero - would need to get
the struct page and check PageAnon. And might want to count swap
entries too. Hard to say without real uses in mind.

Andrew mentioned "unsigned long page": similarly, we usually say
"struct vm_area_struct *vma" rather than "*map" (well, some places
say "*mpnt", but that's not a precedent to follow).

Regarding the display.

It's a mixture of two different styles, the /proc/<pid>/maps
many-hex-fields one-vma-per-line style and the /proc/meminfo
one-decimal-kB-per-line style. I think it would be better following
the /proc/<pid>/maps style, but replacing the major,minor,ino fields
by size and rss (anon_rss? swap?) fields (decimal kB? I suppose so).

Hugh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/