Re: [PATCH 3/4] oom: oom-killer don't use permillage of system-raminternally

From: KOSAKI Motohiro
Date: Fri May 13 2011 - 06:28:59 EST


(2011/05/11 8:40), David Rientjes wrote:
On Tue, 10 May 2011, KOSAKI Motohiro wrote:

CAI Qian reported his kernel did hang-up if he ran fork intensive
workload and then invoke oom-killer.

The problem is, Current oom calculation uses 0-1000 normalized value
(The unit is a permillage of system-ram). Its low precision make
a lot of same oom score. IOW, in his case, all processes have<1
oom score and internal integral calculation round it to 1. Thus
oom-killer kill ineligible process. This regression is caused by
commit a63d83f427 (oom: badness heuristic rewrite).

The solution is, the internal calculation just use number of pages
instead of permillage of system-ram. And convert it to permillage
value at displaying time.

This patch doesn't change any ABI (included /proc/<pid>/oom_score_adj)
even though current logic has a lot of my dislike thing.


s/permillage/proportion/

This is unacceptable, it does not allow users to tune oom_score_adj
appropriately based on the scores exported by /proc/pid/oom_score to
discount an amount of RAM from a thread's memory usage in systemwide,
memory controller, cpuset, or mempolicy contexts. This is only possible
because the oom score is normalized.

You misunderstand the code. The patch doesn't change oom_score.
The patch change fs/proc too.


What would be acceptable would be to increase the granularity of the score
to 10000 or 100000 to differentiate between threads using 0.01% or 0.001%
of RAM from each other, respectively. The range of oom_score_adj would
remain the same, however, and be multiplied by 10 or 100, respectively,
when factored into the badness score baseline. I don't believe userspace
cares to differentiate between more than 0.1% of available memory.

Currently, SGI buy 16TB memory. 16TB x 0.1% = 1.6GB. I don't think your
fork bomb process use bigger than 1.6GB. Thus your patch is unacceptable.

So, please read the code again. or run it.

The other issue that this patch addresses is the bonus given to root
processes. I agree that if a root process is using 4% of RAM that it
should not be equal to all other threads using 1%. I do believe that a
root process using 60% of RAM should be equal priority to a thread using
57%, however. Perhaps a compromise would be to give root processes a
bonus of 1% for every 30% of RAM they consume?

I think you are talking about patch [4/4], right? patch [3/4] and [4/4]
are attacking another issue. big machine issue and root user issue.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/