Re: OOM Killer killing whole system

From: Andrew Morton
Date: Fri Jan 20 2006 - 17:46:36 EST


Anton Titov <a.titov@xxxxxxx> wrote:
>
> On Fri, 2006-01-20 at 14:04 -0600, Chase Venters wrote:
> > On Fri, 20 Jan 2006, Andrew Morton wrote:
> > >> Jan 15 06:05:09 vip 216477 pages slab
> > >
> > > It's all in slab. 800MB.
> > >
> > > I'd be suspecting a slab memory leak. If it happens again, please take a
> > > copy of /proc/slabinfo, send it.
> > >
> >
> > Andrew & Anton,
> > The culprit was 1.5 million SCSI commands in the scsi command cache.
> >
> > Thanks,
> > Chase
>
> I currently have this:
> scsi_cmd_cache 1458778 1458790 384 10 1 : tunables 54 27
> 8 : slabdata 145879 145879 0
>
> in /proc/slabinfo, which is pretty close to 1.5 million. The system is
> working fine but it should be not very loaded anyway, so a mem leakage
> will not show up early. Just checked, that scsi_cmd_cache on other
> machines of mine is under 100, so it seems like a problem.

That's great, thanks.

This is 2.6.15 and we have a deadly bug in scsi.

Next time you reboot 2.6.15 on that machine can you please send the output
of `dmesg -s 1000000'? You might have to set CONFIG_LOG_BUF_SHIFT=17 to
prevent it from being truncated.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/