Re: The performance and behaviour of the anti-fragmentation relatedpatches

From: Rik van Riel
Date: Fri Mar 02 2007 - 17:43:02 EST


Andrew Morton wrote:
On Fri, 02 Mar 2007 17:03:10 -0500
Rik van Riel <riel@xxxxxxxxxx> wrote:

Andrew Morton wrote:
On Fri, 02 Mar 2007 16:19:19 -0500
Rik van Riel <riel@xxxxxxxxxx> wrote:
Bill Irwin wrote:
On Fri, Mar 02, 2007 at 01:23:28PM -0500, Rik van Riel wrote:
With 32 CPUs diving into the page reclaim simultaneously,
each trying to scan a fraction of memory, this is disastrous
for performance. A 256GB system should be even worse.
Thundering herds of a sort pounding the LRU locks from direct reclaim
have set off the NMI oopser for users here.
Ditto here.
Opterons?
It's happened on IA64, too. Probably on Intel x86-64 as well.

Opterons seem to be particularly prone to lock starvation where a cacheline
gets captured in a single package for ever.

The main reason they end up pounding the LRU locks is the
swappiness heuristic. They scan too much before deciding
that it would be a good idea to actually swap something
out, and with 32 CPUs doing such scanning simultaneously...
What kernel version?
Customers are on the 2.6.9 based RHEL4 kernel, but I believe
we have reproduced the problem on 2.6.18 too during stress
tests.

The prev_priority fixes were post-2.6.18

We tested them. They only alleviate the problem slightly in
good situations, but things still fall apart badly with less
friendly workloads.

I have no reason to believe we should stick our heads in the
sand and pretend it no longer exists on 2.6.21.

I have no reason to believe anything. All I see is handwaviness,
speculation and grand plans to rewrite vast amounts of stuff without even a
testcase to demonstrate that said rewrite improved anything.

Your attitude is exactly why the VM keeps falling apart over
and over again.

Fixing "a testcase" in the VM tends to introduce problems for
other test cases, ad infinitum. There's a reason we end up
fixing the same bugs over and over again.

I have been looking through a few hundred VM related bugzillas
and have found the same bugs persist over many different
versions of Linux, sometimes temporarily fixed, but they seem
to always come back eventually...

None of this is going anywhere, is is it?

I will test my changes before I send them to you, but I cannot
promise you that you'll have the computers or software needed
to reproduce the problems. I doubt I'll have full time access
to such systems myself, either.

32GB is pretty much the minimum size to reproduce some of these
problems. Some workloads may need larger systems to easily trigger
them.

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/