Re: What's the NFS OOM problem?

From: Roger Heflin
Date: Thu Aug 17 2006 - 09:32:05 EST


Neil Brown wrote:
On Tuesday August 15, rheflin@xxxxxxxxx wrote:
I have noticed on SLES kernels that when the dirty_*ratios turned down it
still uses alot more memory than it should work writeback buffers, it makes
me think that with the default setting of 40% that it for some reason
may be using all of memory and deadlocking. It does not seem like an
NFS only issue, as I believe I have duplicated it with a fast lock
setup.

We seem to have a little patch in SuSE kernels that might be making
the problem worse .... though I presume it was introduced for a
reason. I haven't managed to track what that reason was yet.

What is "a fast lock setup"?? I don't understand.

NeilBrown


I am not sure what I ment, I may have ment a fast disk setup, and
thought or typed the wrong thing. The machine I duplicated it with
had disks that would sustain 175MB/second (3 striped), 4cpus with local
ram of 32GB. The 2 cpu/4GB/100MB/second machine does not seem
to have the issue. Both machines are opterons, I believe I duplicated
it under SP2, I know I duplicated it SP3 and one of the
post-SP3 kernels. It did not occur under SP1.

Turning down the dirty*ratios seems to make it go away. When I
get a chance I will retest on SP2 and see if it happens there.

I do know (and this may be related) that if on a 32GB machine I
pagelock a large portion of ram (say 28GB) that machine will deadlock
under high IO. The basic symptoms are similar to the writeback
issue the machine responds to ping/sysrq, but logins fail, and any
new process creation fails.

Roger
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/