Re: [RFC 1/3] oom, sysrq: Skip over oom victims and killed tasks

From: David Rientjes
Date: Wed Jan 20 2016 - 19:02:04 EST


On Wed, 20 Jan 2016, Michal Hocko wrote:

> No, I do not have a specific load in mind. But let's be realistic. There
> will _always_ be corner cases where the VM cannot react properly or in a
> timely fashion.
>

Then let's identify it and fix it, like we do with any other bug? I'm 99%
certain you are not advocating that human intervention is the ideal
solution to prevent lengthy stalls or livelocks.

I can't speak for all possible configurations and workloads; the only
thing we use sysrq+f for is automated testing of the oom killer itself.
It would help to know of any situations when people actually need to use
this to solve issues and then fix those issues rather than insisting that
this is the ideal solution.

> To be honest I really fail to understand your line of argumentation
> here. Just that you think that sysrq+f might be not helpful in large
> datacenters which you seem to care about, doesn't mean that it is not
> helpful in other setups.
>

This type of message isn't really contributing anything. You don't have a
specific load in mind, you can't identify a pending bug that people have
complained about, you presumably can't show a testcase that demonstrates
how it's required, yet you're arguing that we should keep a debugging tool
around because you think somebody somewhere sometime might use it.

[ I would imagine that users would be unhappy they have to kill processes
already, and would have reported how ridiculous it is that they had to
use sysrq+f, but I haven't seen those bug reports. ]

I want the VM to be responsive, I don't want it to thrash forever, and I
want it to not require root to trigger a sysrq to have the kernel kill a
process for the VM to work properly. We either need to fix the issue that
causes the unresponsiveness or oom kill processes earlier. This is very
simple.