Re: [Bug 9182] Critical memory leak (dirty pages)

From: Krzysztof Oledzki
Date: Sun Dec 16 2007 - 04:34:17 EST




On Sat, 15 Dec 2007, Andrew Morton wrote:

On Sun, 16 Dec 2007 00:08:52 +0100 (CET) Krzysztof Oledzki <olel@xxxxxx> wrote:



On Sat, 15 Dec 2007, bugme-daemon@xxxxxxxxxxxxxxxxxxx wrote:

http://bugzilla.kernel.org/show_bug.cgi?id=9182


------- Comment #33 from protasnb@xxxxxxxxx 2007-12-15 14:19 -------
Krzysztof, I'd hate point you to a hard path (at least time consuming), but
you've done a lot of digging by now anyway. How about git bisecting between
2.6.20-rc2 and rc1? Here is great info on bisecting:
http://www.kernel.org/doc/local/git-quick.html

As I'm smarter than git-bistect I can tell that 2.6.20-rc1-git8 is as bad
as 2.6.20-rc2 but 2.6.20-rc1-git8 with one patch reverted seems to be OK.
So it took me only 2 reboots. ;)

The guilty patch is the one I proposed just an hour ago:
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.20.y.git;a=commitdiff_plain;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9

So:
- 2.6.20-rc1: OK
- 2.6.20-rc1-git8 with fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 reverted: OK
- 2.6.20-rc1-git8: very BAD
- 2.6.20-rc2: very BAD
- 2.6.20-rc4: very BAD
- >= 2.6.20: BAD (but not *very* BAD!)


well.. We have code which has been used by *everyone* for a year and it's
misbehaving for you alone.

No, not for me alone. Probably only I and Thomas Osterried have systems where it is so easy to reproduce. Please note that the problem exists on my all systems, but only on one it is critical. It is enough to run
"sync; sleep 1; sunc; sleep 1; sync; grep Drirty /proc/meminfo" to be sure. With =>2.6.20-rc1-git8 it *never* falls to 0 an *all* my hosts but only on one it goes to ~200MB in about 2 weeks and then everything dies:
http://bugzilla.kernel.org/attachment.cgi?id=13824
http://bugzilla.kernel.org/attachment.cgi?id=13825
http://bugzilla.kernel.org/attachment.cgi?id=13826
http://bugzilla.kernel.org/attachment.cgi?id=13827

I wonder what you're doing that is different/special.
Me to. :|

Which filesystem, which mount options

- ext3 on RAID1 (MD): / - rootflags=data=journal
- ext3 on LVM on RAID5 (MD)
- nfs

/dev/md0 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
devpts on /dev/pts type devpts (rw,nosuid,noexec)
/dev/mapper/VolGrp0-usr on /usr type ext3 (rw,nodev,data=journal)
/dev/mapper/VolGrp0-var on /var type ext3 (rw,nodev,data=journal)
/dev/mapper/VolGrp0-squid_spool on /var/cache/squid/cd0 type ext3 (rw,nosuid,nodev,noatime,data=writeback)
/dev/mapper/VolGrp0-squid_spool2 on /var/cache/squid/cd1 type ext3 (rw,nosuid,nodev,noatime,data=writeback)
/dev/mapper/VolGrp0-news_spool on /var/spool/news type ext3 (rw,nosuid,nodev,noatime)
shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85)
owl:/usr/gentoo-nfs on /usr/gentoo-nfs type nfs (ro,nosuid,nodev,noatime,bg,intr,tcp,addr=192.168.129.26)


what sort of workload?
Different, depending on a host: mail (postfix + amavisd + spamassasin + clamav + sqlgray), squid, mysql, apache, nfs, rsync, .... But it seems that the biggest problem is on the host running mentioned mail service.

Thanks.

Best regards,

Krzysztof Olędzki