Re: Page alloc failures under network/disk IO load

From: Dan NoÃ
Date: Thu Dec 04 2008 - 13:54:55 EST


On Thu, 04 Dec 2008 09:42:08 +0100
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Thu, 2008-12-04 at 09:23 +0100, Peter Zijlstra wrote:
> > On Wed, 2008-12-03 at 22:27 -0500, Dan Noà wrote:
> > > This is on Linux 2.6.28-rc7, on a Core 2 Duo. The system has
> > > plenty of memory:
> > >
> > > total used free shared buffers
> > > cached
> > > Mem: 1893 1822 70 0 0
> >
> > filled to the brim with data
> >
> > > 1573
> > > -/+ buffers/cache: 249 1644
> > > Swap: 1906 37 1868
> > >
> > > I am using rsync to transfer data onto this system. The
> > > filesystem is XFS, and the target drive is a 1TB Western Digital
> > > on ata_piix. The system files are on a RAID 1 (Linux md, also on
> > > ata_piix).
> > >
> > > Periodically I get page allocation failures, from
> > > __netdev_alloc_skb. I suppose this causes the driver to drop
> > > packets and thus hurts performance.
> >
> > There isn't much we can do about that, memory is filled and your
> > network card tries to allocate memory in a mode that doesn't allow
> > freeing some.
> >
> > Looking at the timestamps its not very frequent, so it doesn't hurt
> > performance much if anything. If you're really bothered with this,
> > you could quiet it by sticking in a __GFP_NOWARN in
> > __netdev_alloc_skb() or something..
>
> Another thing you can do is increase /proc/sys/vm/min_free_kbytes

I'm a bit confused because on another system (2.6.26.3) I never see
messages like this despite having the same amount of physical RAM in
each. The 2.6.26.3 system is also under more active use, and has more
userspace memory usage. On that system:

total used free shared buffers
cached Mem: 2017 1681 335 0
99 603 -/+ buffers/cache: 979 1037
Swap: 972 137 835

dpn@colobus:~$ cat /proc/sys/vm/min_free_kbytes
3816

Yet on the system where I saw the allocation failures:

dpn@trout:~/kernels/linux-2.6$ cat /proc/sys/vm/min_free_kbytes
5711

If I understand it correctly the issue is that __netdev_alloc_skb must
make a GFP_ATOMIC allocation, which fails because the page cache must
evict pages before there is sufficient memory. And
min_free_kbytes allows tuning of the point where try_to_free_pages is
called and thus the "reserve" memory available. Is that correct?

Wouldn't a higher min_free_kbytes mean less likelihood of GFP_ATOMIC
allocations failing? Or are these allocations failing on my 2.6.26.3
system and I don't know it because of different config options?

Why am I seeing this on the system with the *higher* min_free_kbytes?

Cheers,
Dan

--
Dan NoÃ
Software Engineer
Lime Brokerage LLC
781-370-2518
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/