Re: [oom]: [0/4] fix OOM deadlock running OAST

From: William Lee Irwin III
Date: Wed Jun 23 2004 - 19:27:57 EST


William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
>> It's a
>> judgment call as to whether it's beneficial in general, as it does
>> insulate userspace somewhat from needing to wait for slow IO being the
>> ostensible cause of the allocation failure.

On Wed, Jun 23, 2004 at 05:18:18PM -0700, Andrew Morton wrote:
> mm... I can only see that happening if the IO system is retiring write
> requests at much less than 10/sec, which seems unlikely. Still, that can
> be tuned around.

Then it sounds like the smaller fix below may be better for you.


William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:
>> RedHat vendor kernels have removed the check entirely

On Wed, Jun 23, 2004 at 05:18:18PM -0700, Andrew Morton wrote:
> When telling us this sort of thing, please always specify the kernel version.
> I assume you're referring to a 2.6 kernel? If so, some thwapping might be
> in order.

No, RHEL3. I'm not aware of any mm/oom_kill.c changes in any of the
Fedora snapshots.


-- wli

During stress testing at Oracle to determine the maximum number of
clients 2.6 can service, it was discovered that the failure mode of
excessive numbers of clients was kernel deadlock. The following patch
removes the check if (nr_swap_pages > 0) from out_of_memory() as this
heuristic fails to detect memory exhaustion due to pinned allocations,
directly causing the aforementioned deadlock.


===== mm/oom_kill.c 1.26 vs edited =====
--- 1.26/mm/oom_kill.c Thu Jun 3 01:46:39 2004
+++ edited/mm/oom_kill.c Wed Jun 23 17:22:22 2004
@@ -230,12 +230,6 @@
static unsigned long first, last, count, lastkill;
unsigned long now, since;

- /*
- * Enough swap space left? Not OOM.
- */
- if (nr_swap_pages > 0)
- return;
-
spin_lock(&oom_lock);
now = jiffies;
since = now - last;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/