Re: [oom]: [0/4] fix OOM deadlock running OAST

From: William Lee Irwin III
Date: Wed Jun 23 2004 - 18:08:36 EST


On Wed, Jun 23, 2004 at 03:37:58PM -0700, Andrew Morton wrote:
> What about zone->all_unreclaimable?

It's unclear which zones must be checked for this to be of use. In
general, suppose one determines all possible zones from which a given
allocation may be satisfied (this still assumes out_of_memory() is
passed a gfp_mask, and then discovers ->all_unreclaimable. It is still
not possible to determine whether nr_swap_pages is relevant.

Now suppose the check for nr_swap_pages > 0 is removed in tandem with
the passing of the gfp_mask to out_of_memory() and checking
->all_unreclaimable. This will then risk triggering OOM falsely for
unpinned pagecache allocations in the presence of high scan rates,
though it may yet be safe.

There are larger semantic questions also. For instance, the runs that
deadlocked were with non-overcommit enabled (that is, I left the sysctl
/proc/sys/vm/overcommit_memory == 0 after boot). The intention was to
provide best effort unfailability to userspace allocations by detecting
insufficient resources up-front and returning MAP_FAILED from mmap() and
so on. There is a current hole in this, which is that pinned kernel
memory allocations aren't subtracted from the pool of memory considered
to be available to userspace. This would add the additional caveat that
pure userspace memory allocations may result in OOM kills where they
didn't before, as without checking __GFP_WIRED, it's not possible to
determine whether nr_swap_pages > 0 is relevant, where it would suffice
for pure userspace allocations with non-overcommit.

__GFP_WIRED is faithful to the current timeout-based semantics and
so minimizes the risk associated with discarding the nr_swap_pages > 0
check. I'm willing to entertain deeper semantic changes such as the
above removal of nr_swap_pages > 0 in tandem with passing the gfp_mask
to out_of_memory() and checking for ->all_unreclaimable for all the
members of the gfp_mask's zonelist if you still want them given all this.

It also occurs to me that it's possible to detect overcommitment via a
combination of kernel and user allocations by means of summing the
per_cpu nr_wired counters along with vm_committed_memory, so there may
yet be methods of strengthening strict non-overcommit semantics.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/