Re: Some questions about linux kernel.

From: Jesse Pollard (pollard@tomcat.admin.navo.hpc.mil)
Date: Fri Mar 17 2000 - 13:43:26 EST

Next message: Jeff Dike: "user-mode port 0.16-2.3.99-pre1"
Previous message: Adam J. Richter: "2.3.99pre2-3: ide_xlate_1024 not linked in"
Next in thread: James Sutherland: "Re: Some questions about linux kernel."
Reply: James Sutherland: "Re: Some questions about linux kernel."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

James Sutherland <jas88@cam.ac.uk>:
> On Fri, 17 Mar 2000, Jesse Pollard wrote:
> > Paul Jakma <paul.jakma@compaq.com>
> > > On Thu, 16 Mar 2000, James Sutherland wrote:
> > >
> > > > No. *ANY* memory allocation system can run out of memory. Avoiding
> > > > "overcommitting" would make the OOM situation arise SOONER (and more
> > > > frequently), as well as killing performance.
> > > >
> > >
> > > well, it's a more a question of whether you make promises that you might
> > > not be able to keep. If you do (ie overcommit) then it's your
> > > (kernel) problem. If you don't, it's not.
> > >
> > > Without overcommit the /system/ can run out memory, of course - it's
> > > finite - but it's no longer a kernel problem.
> > >
> > > > Right... Now we'll try this on the university's central Unix system, shall
> > > > we? Let's see... 6000 users, 2Gb RAM+swap. They get about 300K each.
> > > > That's ALMOST enough to log in with!
> > >
> > > well then get more ram/swap. But at least it has become a hw issue.
> >
> > It's still not right - the assumption that you have 6000 concurrent users
> > on a 2GB based system is unreasonable. It is much more likely that there
> > is only 40-50 conccurent users. If the limit is 40 then it becomes 2GB/40
> > or ~50MB per user not 300K. After that what you have is a management decision
> > on system expansion, AND the data to back that decision. I have worked in
> > such an environment, and that's the way it is.
>
> I never said 6000 CONCURRENT users. BUT the system is perfectly happy
> without any limits on the number of users. We could have 200 users using
> an AVERAGE of 5Mb each quite happily - if they are all running the same
> couple of programs (a typical workload) overcommit makes the difference
> between needing 2Gb of swap and 20Gb, without *ANY* difference in
> functionality.

IF the limits established for 200 users is 5 MB then the system will NOT
OOM. The additional load would not be permitted to start. No current processes
would be aborted.

> The enormous overhead disabling overcommit would incur, without ANY
> benefits, are simply not practical.

Almost no overhead other than administration. The only overhead is subtracting
from the available size of the users allocation. Once that limit is reached,
the USER gets a resonable signal, and only the USERS process would be
terminated. This has worked quite well in the past, and work quite well on
current systems with user resource limits. The only time OOM is valid is IF
management directs that oversubscription of memory/swap is going to be
tolerated; when the crash occurs then the responsible party is clearly
identified. The Kernel/OS is not at fault.

> > > > The alternative just isn't practical in any real-world situation, except
> > > > more-or-less single user boxes.
> > >
> > > the alternative is very practical. boxes where oracle runs as one user /
> > > apache as another for example. You might want to lock down something
> > > that you know can cause problems.. etc..
> > >
> > > > Even then, that single user will still run
> > > > out of memory - it's just his quota, rather than a physical limit.
> > > >
> > >
> > > obviously.
> > >
> > > > So his processes STILL end up dying randomly, but they do it sooner rather
> > > > than later. Hrm. Wow.
> > > >
> > >
> > > but then it's the users problem - not the OS..
> > >
> > > > On any realistically specced box, it is almost zero already.
> > > >
> > >
> > > no it's not. a plain linux box today has a terrible tendency to die if
> > > an app misbehaves wrt to memory. It's that bad. (unless you think 1GB of
> > > RAM + 4GB's+ of swap is a reasonable spec for a desktop)
> > >
> > > per process limits go a long long way to solving this (easy with
> > > pam_limits) but not far enough. But per user limits would be a huge
> > > improvement...
> > >
> > > > No, it's a risk with *EVERY* OS.
> > >
> > > no it's not. A non overcommiting OS doesn't run out of VM for
> > > processes. it cleanly grants or denies memory requests. What the app
> > > does after that is not a kernel problem.
> >
> > I agree that is is not a problem with the os.
> > It can also prevent additional logins if the resources for the new login
> > are not available to the specific user attempting to log in. This is done
> > in a LOT of UNIX systems, as long as they have functioning, per user,
> > resource controls.
>
> Oh dear. The problem here is when the system runs out of VM and has to
> start denying malloc() requests. *NOT* that it has already allocated more
> VM than it has - it has allocated the maximum amount it is able to, and
> now needs to claw some back for legitimate processes.

The system wouldn't run out of VM. Only users would run out of their allocation
of VM. The system would not have a problem with this.

> Overcommit is, if anything, a partial SOLUTION here - not part of the
> problem.

Overcommit is the symptom of the problem, not a solution. OOM crashes and
random program aborts are the problem. User resource controls are the best
known solution.

I have seen both happen on any system from an IBM 360 to Cray T90. If swap is
oversubscribed, then it can crash (or deadlock). The systems that best
handled resource allocation have been: DEC VAX/VMS, Cray UNICOS. The worst
has been SGI IRIX and Linux (nether even tell you if you are OOM; other
than by random crash...).

I want controlability, and repeatabiltiy. If the system runs out of memory
I want a message stating that occurance. Even more, I want resource controls
that will allow me to be able to eliminate it (most draconian) or permit
it within certain limits; and to know when it happens.
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jeff Dike: "user-mode port 0.16-2.3.99-pre1"
Previous message: Adam J. Richter: "2.3.99pre2-3: ide_xlate_1024 not linked in"
Next in thread: James Sutherland: "Re: Some questions about linux kernel."
Reply: James Sutherland: "Re: Some questions about linux kernel."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:21 EST