Re: [Bug 9182] Critical memory leak (dirty pages)

From: Andrew Morton
Date: Sun Dec 16 2007 - 04:52:19 EST


On Sun, 16 Dec 2007 10:33:20 +0100 (CET) Krzysztof Oledzki <olel@xxxxxx> wrote:

>
>
> On Sat, 15 Dec 2007, Andrew Morton wrote:
>
> > On Sun, 16 Dec 2007 00:08:52 +0100 (CET) Krzysztof Oledzki <olel@xxxxxx> wrote:
> >
> >>
> >>
> >> On Sat, 15 Dec 2007, bugme-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> >>
> >>> http://bugzilla.kernel.org/show_bug.cgi?id=9182
> >>>
> >>>
> >>> ------- Comment #33 from protasnb@xxxxxxxxx 2007-12-15 14:19 -------
> >>> Krzysztof, I'd hate point you to a hard path (at least time consuming), but
> >>> you've done a lot of digging by now anyway. How about git bisecting between
> >>> 2.6.20-rc2 and rc1? Here is great info on bisecting:
> >>> http://www.kernel.org/doc/local/git-quick.html
> >>
> >> As I'm smarter than git-bistect I can tell that 2.6.20-rc1-git8 is as bad
> >> as 2.6.20-rc2 but 2.6.20-rc1-git8 with one patch reverted seems to be OK.
> >> So it took me only 2 reboots. ;)
> >>
> >> The guilty patch is the one I proposed just an hour ago:
> >> http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fstable%2Flinux-2.6.20.y.git;a=commitdiff_plain;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9
> >>
> >> So:
> >> - 2.6.20-rc1: OK
> >> - 2.6.20-rc1-git8 with fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9 reverted: OK
> >> - 2.6.20-rc1-git8: very BAD
> >> - 2.6.20-rc2: very BAD
> >> - 2.6.20-rc4: very BAD
> >> - >= 2.6.20: BAD (but not *very* BAD!)
> >>
> >
> > well.. We have code which has been used by *everyone* for a year and it's
> > misbehaving for you alone.
>
> No, not for me alone. Probably only I and Thomas Osterried have systems
> where it is so easy to reproduce. Please note that the problem exists on
> my all systems, but only on one it is critical. It is enough to run
> "sync; sleep 1; sunc; sleep 1; sync; grep Drirty /proc/meminfo" to be sure.
> With =>2.6.20-rc1-git8 it *never* falls to 0 an *all* my hosts but only
> on one it goes to ~200MB in about 2 weeks and then everything dies:
> http://bugzilla.kernel.org/attachment.cgi?id=13824
> http://bugzilla.kernel.org/attachment.cgi?id=13825
> http://bugzilla.kernel.org/attachment.cgi?id=13826
> http://bugzilla.kernel.org/attachment.cgi?id=13827
>
> > I wonder what you're doing that is different/special.
> Me to. :|
>
> > Which filesystem, which mount options
>
> - ext3 on RAID1 (MD): / - rootflags=data=journal

It wouldn't surprise me if this is specific to data=journal: that
journalling mode is pretty complex wrt dairty-data handling and isn't well
tested.

Does switching that to data=writeback change things?

THomas, do you have ext3 data=journal on any filesytems?

> - ext3 on LVM on RAID5 (MD)
> - nfs
>
> /dev/md0 on / type ext3 (rw)
> proc on /proc type proc (rw)
> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
> devpts on /dev/pts type devpts (rw,nosuid,noexec)
> /dev/mapper/VolGrp0-usr on /usr type ext3 (rw,nodev,data=journal)
> /dev/mapper/VolGrp0-var on /var type ext3 (rw,nodev,data=journal)
> /dev/mapper/VolGrp0-squid_spool on /var/cache/squid/cd0 type ext3 (rw,nosuid,nodev,noatime,data=writeback)
> /dev/mapper/VolGrp0-squid_spool2 on /var/cache/squid/cd1 type ext3 (rw,nosuid,nodev,noatime,data=writeback)
> /dev/mapper/VolGrp0-news_spool on /var/spool/news type ext3 (rw,nosuid,nodev,noatime)
> shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
> usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85)
> owl:/usr/gentoo-nfs on /usr/gentoo-nfs type nfs (ro,nosuid,nodev,noatime,bg,intr,tcp,addr=192.168.129.26)
>
>
> > what sort of workload?
> Different, depending on a host: mail (postfix + amavisd + spamassasin +
> clamav + sqlgray), squid, mysql, apache, nfs, rsync, .... But it seems
> that the biggest problem is on the host running mentioned mail service.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/