Re: How to make warn_alloc() reliable?

From: Tetsuo Handa
Date: Wed Oct 19 2016 - 10:16:44 EST


Michal Hocko wrote:
> This is not about warn_alloc reliability but more about
> too_many_isolated waiting for an unbounded amount of time. And that
> should be fixed. I do not have a good idea how right now.

I'm not talking about only too_many_isolated() case. If I were talking about
this specific case, I would have proposed leaving this loop using timeout.
For example, where is the guarantee that current thread never get stuck
at shrink_inactive_list() after leaving this too_many_isolated() loop?

I think that perception of ordinary Linux user's memory management is
"Linux reclaims memory when needed. Thus, it is normal that MemFree:
field of /proc/meminfo is small." and "Linux invokes the OOM killer if
memory allocation request can't make forward progress". However we know
"Linux may not be able to invoke the OOM killer even if memory allocation
request can't make forward progress". You suddenly bring up (or admit to)
implications/limitations/problems most Linux users do not know. That's
painful for me who went to a lot of trouble to get some clue at a support
center.

When we were off-list talking about CVE-2016-2847, your response had been
"Your machine is DoSed already" until we notice the "too small to fail"
memory-allocation rule. If I were not continuing examining until I make
you angry, we would not have come to correct answer. I don't like your
optimistic "Fix it if you can trigger it." approach which will never give
users (and troubleshooting staffs at support centers) a proof. I want a
"Expose what Michal Hocko is not aware of or does not care" mechanism.

What I'm talking about is "why don't you stop playing whack-a-mole games
with missing warn_alloc() calls". I don't blame you for not having a good
idea, but I blame you for not having a reliable warn_alloc() mechanism.