Re: [PATCH] Remove OOM killer from try_to_free_pages / all_unreclaimable braindamage

From: Andrea Arcangeli
Date: Sat Nov 06 2004 - 12:46:58 EST


On Sat, Nov 06, 2004 at 07:54:12PM +0300, Nikita Danilov wrote:
> Andrea Arcangeli writes:
> > On Sat, Nov 06, 2004 at 02:37:05PM +0300, Nikita Danilov wrote:
> > > We need page-reservation API of some sort. There were several attempts
> > > to introduce this, but none get into mainline.
> >
> > they're already in under the name of mempools
>
> I am talking about slightly different thing. Think of some operation
> that calls find_or_create_page(). find_or_create_page() doesn't know
> about memory reserved in mempools, it uses alloc_page() directly. If one
> wants to guarantee that compound operation has enough memory to
> complete, memory should be reserved at the lowest level---in the page
> allocator.

the page allocator reserve only memory in order to swapout, that's
PF_MEMALLOC.

For other purposes not related to swapping (which is not a deterministic
thing, given there can be multiple layers of I/O and fs operations to
do), you should use mempool and change find_or_create_page to get your
reserved page as parameter.

> > I'm perfectly aware the fs tends to be the less correct places in terms
> > of allocations, and luckily it's not an heavy memory user, so I still
>
> Either you are kidding, or we are facing very different workloads. In
> the world of file-system development, file-system is (not surprisingly)
> single largest memory consumer.

when the machine runs oom the fs allocations means nothing. when the
machine runs oom is because somebody entered the malloc loop or
something like that. all allocations come from page faults.

Try yourself to run your box oom with getblk allocations. Only then
you'll run into the deadlock.

You've to keep in mind an oom condition happens once in a while, and
when it happens the userspace memory allocation load is huge compared to
any fs operation.

> > have to see a deadlock in getblk or create_buffers or similar. It's
> > mostly a correctness issue (math proof it can't deadlock, right now it
> > can if more tasks all get stuck in getblk at the same time during a hard
> > oom condition etc..).
>
> Add here mmap that can dirty all physical memory behind your back, and
> delayed disk block allocation that forces ->writepage() to allocate
> potentially huge extent when memory is already tight and hope of having
> a proof becomes quite remote.

that's the PF_MEMALLOC path. A reservation already exists, or it would
never work since 2.2. PF_MEMALLOC and the min/2 watermark are meant to
allow writepage to allocate ram. however the amount reserved is limited,
so it's not perfect. The only way to make it perfect I believe is to
reserve the stuff inside the fs with mempools as described above.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/