Re: System freezes after OOM

From: Mikulas Patocka
Date: Thu Jul 14 2016 - 08:27:18 EST




On Wed, 13 Jul 2016, David Rientjes wrote:

> On Wed, 13 Jul 2016, Mikulas Patocka wrote:
>
> > What are the real problems that f9054c70d28bc214b2857cf8db8269f4f45a5e23
> > tries to fix?
> >
>
> It prevents the whole system from livelocking due to an oom killed process
> stalling forever waiting for mempool_alloc() to return. No other threads
> may be oom killed while waiting for it to exit.
>
> > Do you have a stacktrace where it deadlocked, or was just a theoretical
> > consideration?
> >
>
> schedule
> schedule_timeout
> io_schedule_timeout
> mempool_alloc
> __split_and_process_bio
> dm_request
> generic_make_request
> submit_bio
> mpage_readpages
> ext4_readpages
> __do_page_cache_readahead
> ra_submit
> filemap_fault
> handle_mm_fault
> __do_page_fault
> do_page_fault
> page_fault

Device mapper should be able to proceed if there is no available memory.
If it doesn't proceed, there is a bug in it.

I'd like to ask - what device mapper targets did you use in this case? Are
there some other deadlocked processes? (show sysrq-t, sysrq-w when this
happened)

Did the machine lock up completely with that stacktrace, or was it just
slowed down?

> > Mempool users generally (except for some flawed cases like fs_bio_set) do
> > not require memory to proceed. So if you just loop in mempool_alloc, the
> > processes that exhasted the mempool reserve will eventually return objects
> > to the mempool and you should proceed.
> >
>
> That's obviously not the case if we have hundreds of machines timing out
> after two hours waiting for that fault to succeed. The mempool interface
> cannot require that users return elements to the pool synchronous with all
> allocators so that we can happily loop forever, the only requirement on

Mempool users must return objects to the mempool.

> the interface is that mempool_alloc() must succeed. If the context of the
> thread doing mempool_alloc() allows access to memory reserves, this will
> always be allowed by the page allocator. This is not a mempool problem.

Mikulas