Re: Overcommitable memory??

From: James Sutherland (jas88@cam.ac.uk)
Date: Tue Mar 21 2000 - 07:17:12 EST


On Mon, 20 Mar 2000 21:07:26 -0600, you wrote:

>On Mon, 20 Mar 2000, Horst von Brand wrote:
>>Jesse Pollard <pollard@tomcat.admin.navo.hpc.mil> said:
>>
>>[...]
>>
>>> THAT DEPENDS ON POLICY - as defined by management. That "very big and newly
>>> started" may NOT be the most suitable. If resource quotas are in place, then
>>> the process would not have been started at all. If quotas are adjusted
>>> to make resources available then again there is no need to abort any
>>> process.
>>
>>Except those of users that just so happen are going over their allocated
>>quota. Which go to management complaining that not even at night, when the
>>system is otherwise idle, they can run their large jobs.
>
>Only if management has said they won't run their jobs. I work in an environment
>that schedules large job runs. Sorry in this you miss the boat.

If you are running batch jobs, you can do all this without using logon
restrictions etc - just set the job scheduler accordingly.

>>> I'm already working in one area where OOM is considered catastrophic -
>>> High security and high availablity systems must NOT have the system go
>>> OOM. That is what per user resource quotas help to prevent.
>>
>>That is a very narrow, specialized application area. I'd assume you have
>>the resources then to hire somebody to "fix" the kernel for you, and then
>>publisghing the resulting patches for all to consider/include/use.
>
>I said one area - that area covers:
> 1. E-commerce servers
No. These are running server software anyway, not a load of
interactive user tasks. Since you only HAVE one (or maybe two)
"users", and every job is mission-critical, your quotas cause problems
here, without helping.

If the system IS properly sized, and the software does not
malfunction, you never go OOM anyway. If either of these does not
hold, your system is hosed anyway. A "live" WWW server which has
killed the httpd is no use to anyone.

> 2. banking and accounting
No. If your banking software malfunctions, you have bigger problems
than what recovery measures the kernel can or cannot implement.

> 3. medical data bases
No. See (1).

> 4. Stock market servers
No. See (1)

> 5. large scientific/enginerring simulation
No. Single user environment; if you are OOM, the one and only task has
failed already - it's too late for the OS to help you now.

> 6. corporate disk servers
No. See (1).

>This actually covers the majority of the use of computers. I don't care
>if they are also served by failure prone systems now. I don't want them
>to fail.
>
>THEY SHOULD NOT FAIL.

And they don't, at least from OOM conditions.

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:32 EST