Re: 2.2.11: Complicated memory leak...

Phil DeBecker (debecker@NO.SPAM.iglou.com)
Thu, 12 Aug 1999 01:39:05 -0400


>I could reproduce this problem: There are 256 MByte RAM in my PC, 512
MByte swap, SMP enabled, >etc. I have neither Vmwarre,Squid, KDE and so
on, but there are at least 19 remote disks mounted via >NFS2. My PC is
configured as NIS client. Currently I am running th Potato Debian
distribution, i.e. the C >lib is glibc 2.1, and the kernel has been
compiled with gcc 2.95.

>After about 5 minutes or so ypbind, getty and other important jobs die
with 'out of memory'. Only the reset >button helps.

To add oil to the fire... I got a similar problem with a Mandrake 5.3
distribution updated to the "red hat 5.2 + kernel 2.2 rpm's" level ;
I had a lot of Eterms and Netscape and such open; eventually memory got
tight due to Netscape and there was much thrashing; I tried to kill
Netscape but the system locked up tight and wouldn't respond to pings,
requiring a reset. Irritatingly, this is a machine that had been up for
38 days on 2.2.10 before that...

I tried to reproduce the problem later, and what I found was that I'd
used up about 100 of 128 megs and couldn't manage to make the "free"
amount increase, even by going to runlevel 1. I wrote a quick prog to
malloc and set N number of bytes, and sure enough I couldn't allocate
more than the stated free amounts (swap + ram). There was no accounting
for the consumed memory; there was a total of about 3mb of processes +
about 20megs of cache + buffers, yet I only had 12mb free RAM out of
128. The swapfile was basically free with about 3 megs of 128 consumed
there.

Frustratingly, my much-updated RH5.1 system here at home doesn't seem to
exhibit the problem. The important stuff is all the same versions, and
both systems are compiled with egcs-2.90.29 and have glibc 2.0.7-29.
Neither machine is runnning anything funky like VMWare or such like;
neither runs Apache or anything heavy-duty server wise, mostly just
Eterms and client apps like Netscape. The one difference between the
two that Harri's message brings to mind is that the machine which
exhibits the problem has a couple of NFS drives mounted from a Solaris
box.

I'll happily try digging some more; I'll gladly apply any suggestions as
to what to try and how to diagnose.

Phil DeBecker

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/