Re: Avoiding OOM on overcommit...?

From: James Sutherland (jas88@cam.ac.uk)
Date: Mon Mar 20 2000 - 16:10:07 EST


On Mon, 20 Mar 2000 10:21:04 -0600 (CST), you wrote:
>James Sutherland <jas88@cam.ac.uk>
>> On Mon, 20 Mar 2000 05:16:17 -0600, you wrote:
>> >On Sun, 19 Mar 2000, Paul Jakma wrote:
>> >>On Sun, 19 Mar 2000, Jesse Pollard wrote:
>> >>
>> >> It all depends on the application. I have seen embeded systems that
>> >> could cause a loss of life on failure. I have worked on some that
>> >> almost cost the job of the manager who sent it out.
>> >>
>> >>obviously you're talking about linux, or any other general purpose
>> >>timesharing OS then?
>> >
>> >No I'm not. Linux doesn't have the reliability yet. Cray UNICOS has
>> >been used to control nukes though (or so I've been told).
>>
>> Linux isn't intended to be used in this way. I doubt they installed
>> Unicos in the nukes either, since Crays tend to be pretty heavy.
>
>It could be if it weren't so "buggy".

It's not really a suitable system. I wouldn't use any general-purpose
OS for this sort of application really.

>> I wouldn't want a general-purpose desktop/server OS controlling nukes
>> or life support systems - so don't ask for those features in my
>> desktop/server OS.
>
>And yet - the major competetor is doing (or trying to) do that - controling
>missiles on a cruser, controling the cruser...

They never were very bright... Must go, another BSOD on my mouse. :)

>Are you now saying Linux is less reliable than NT?

No. I never suggested that either was a suitable system for this task.
I have seen NT used as a flash GPS front-end on a cable laying vessel,
but it wasn't controlling the ship - it had crew for that.

>> RTLinux might want these sorts of features, perhaps, but the
>> desktop/server variant doesn't need it and wouldn't benefit from it.
>>
>> >>Overcommit is good, everyone does it, it's not going away. let's stop
>> >>talking about it...
>> >
>> >NO not everyone. It is done when management decides that it is too
>> >expensive to avoid it, and when it is determined not to be life
>> >threatening.
>>
>> i.e. any normal desktop/server situation. If they want embedded-type
>> features, they shouldn't be using Linux to begin with.
>
>Normal DESKTOP. Not a normal server.

Normal servers benefit significantly from overcommit; they rarely run
out of memory, because the applications running are well-defined and
tested ones. Apart from anything else, if the server software
malfunctions causing an OOM condition, the server is effectively down
anyway, BEFORE you do any handling of this.

If you are talking about a multi-user system (i.e. shell accounts
etc., rather than WWW/print/file servers) you are less well protected
on Linux than you could/should be - but overcommit still HELPS here.

>> >Look how upset the FAA gets when the air traffic system dies. That was/is
>> >based on an old production system (more batch than interactive).
>> >Now nobody is saying they are planning to use Linux for that, but it is
>> >a counterexample.
>>
>> The criteria you seem to be trying to apply to Linux just aren't
>> relevant. You seem to be criticising Linux for lacking features which
>> you would want in a safety-critical system - but since Linux isn't
>> intended to be used in this way anyway, why does this matter?
>
>It could be. I want it able to:
> 1. perform in a server environment with 50-100 concurrent users.
Linux is OK here; overcommit helps.
> 2. perform in high avilabilty environment with 50-100 users (disk servers)
Linux is also OK here; overcommit makes little difference.
> 3. perform in a high security environment (banks, hospitals, and E-commerce)
Linux isn't ideal here, but beats the ... out of NT. Overcommit isn't
relevant.
> 4. perform in high reliability environments (printers, scanners, specialized
> controllers)
Linux is barely suitable here without major modifications. Since, as
with Netware, you are running well-defined, controlled software,
overcommit isn't a problem here.

>I have 4 linux server/workstations that control a ~$500M (US) computer center.
>I would like to continue to do so because it is simpler to manage (even without
>resource controls). I do get periodic questions about why it "just rebooted
>itself".

That certainly shouldn't happen - and if it does, it's a bug in the
kernel, NOT due to the presence of overcommit or anything else.

>It is just as usable now as NT. Its' basic design is more (IMHO) reliable.
>It lacks resource accounting, and high security support.

As does NT...

>It is already getting high level security developement. It has been given
>some resource accounting (IP level, and disk quotas). It is now time to
>add memory and cpu accounting with resource control.

So write it. I accept that Linux should have these features - I never
disputed that. It's just a matter of either waiting for someone else
to do it, or doing it yourself.

>> I don't want Linux being installed on any life-support systems,
>> air-traffic control backends, or missile guidance systems. Equally, I
>> don't want to run that sort of OS on my desktop or server PC.
>
>I think it could be a better choice than the current alternative...

If you are faced with a choice between Linux and NT for a missile
guidance system, defect - it's safer to be at the target end of that
system. OTOH, perhaps a defective missile guidance system would
achieve a lower FFI rate...

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:31 EST