Re: Some questions about linux kernel.

From: Jeff McNeil (jeffry@interland.net)
Date: Tue Mar 21 2000 - 09:09:56 EST


I usually don't post much, but this OOM/Quota talk has caught my attention.
Something direly needed around here. Were using some large Linux servers,
6+ CPU, and this is a big problem.

----- Original Message -----
From: James Sutherland <jas88@cam.ac.uk>
To: <pollard@cats-chateau.net>
Cc: David Whysong <dwhysong@physics.ucsb.edu>; Richard B. Johnson
<root@chaos.analogic.com>; Marco Colombo <marco@esi.it>; linux kernel
<linux-kernel@vger.rutgers.edu>
Sent: Tuesday, March 21, 2000 6:55 AM
Subject: Re: Some questions about linux kernel.

> On Mon, 20 Mar 2000 22:07:48 -0600, you wrote:
>
> >On Mon, 20 Mar 2000, David Whysong wrote:
> >>On Mon, 20 Mar 2000, Richard B. Johnson wrote:
> >>
> >>>The only solution to an out-of-memory condition is to never run
> >>>out of memory.
> >>
> >>Yes!
> >>
> >>>The place where all of the system information is known is in "user
> >>>space". The kernel readily "knows" stuff about the current process, but
> >>>retrieving information about other tasks in a page-fault handler would
> >>>result in an extremely poor performing machine. A user-space daemon can
> >>>acquire information about all the tasks, can detect runaway tasks, can
> >>>safeguard special tasks like Web Servers that haven't gone crazy, and
> >>>can watch for performance hurting rogue programs.
> >>>
> >>>Such a program, if properly designed, is the solution to such
> >>>out-of-memory conditions.
> >>
> >>No! Or perhaps it depends on what you want this user-space daemon to do.
> >>
> >>Once you reach the OOM condition, this program can not reliably run. And
I
> >>doubt that a user-space daemon could prevent OOM from happening on a
time
> >>sharing system, since a malicious (or buggy) program could try to use
all
> >>memory during a single timeslice.
> >
> >right.
> >
> >>When OOM the kernel has to kill something. The trick is for the kernel
to
> >>kill tasks that are considered "less important". Don't kill init, X
> >>server, syslogd, whatever. Some of this policy could be set by a
userspace
> >>daemon.
> >
> >By the time the userspace daemon runs it is too late.
>
> Not necessarily; it could certainly catch and handle SOME problems.
> (Like an exploding Netscape; they tend to grow quite slowly.)

I thought about doing this a while back, but never found time to get past
testing. My idea was as follows:

1. Create system call, sys_rquotactl. Call will 1) Set/remove quotas,
2)report current quota when called with a predefined UID as an argument. By
quotas I mean a mimimum garenteed amount, and a maximum threshold.

2. Write user space program to set and report quotas at boot time, based on
configuration file.

When resources are requested, the quota is checked, if the user will go
over, OOM error is returned and the allocation request fails, if the system
is above a predefined load, or if a higher "priority" user needs the memory.

I had actually made most of the changes to do this about 3 months ago, if I
can find the source, I'll send it along.

> >The kernel must do the bookkeeping for the resource allocation. It must
> >be able to decrement from the users quota what is being allocated by the
> >various allocation methods - fork, exec, sbrk, stack page fault, COW.
>
> Yes, it needs per-user resource quotas. We've been over this many
> times before.
>
> >The kernel can enforce the limit, but it cannot determine what the limit
> >should be. This is the place for the userspace daemon.
>
> If you set the quotas, then yes.
>
> >On some systems, the quotas are staticly set by administration, and
loaded
> >into the kernel on login (or cron/batch). If the quotas to be set are
> >not available (determined by the userspace daemon) then the user should
> >not be logged in (or cron/batch job not started).
>
> Which amounts to system downtime, for that user at least...
>

Better yet, use a default quota. All users fit a certain group, unless
explicitly changed.

 In my opinion, quotas shouldn't be truly enforced until the system reaches
a certain state. Before OOM hits, but after a preconfigured acceptable
threshold. Very much a long the lines of HP's Process Resource Manager for
HPUX. This way, if a user with a low quota logs in at 2am, and no one else
is using the system, they have full access to the system's resources. Then,
if a more privelaged user logs in and needs the space that the
"not-so-important" user is using, he is gets what he's garenteed via quotas.

> >It may be possible for the userspace daemon to be able to carry out
> >adjustments - If the policy is that from 8-5 all logins/cron/batch may
> >have the quota set by administration until N logins are active. No more
> >may be allowed. If the policy between 5-8 (over night) is that only
> >5 logins are allowed, then the quota set by administration might be
> >expanded by some amount - this is a basis for batch schecdules at
> >night where the batch queues are allowed a totaly independant quota
> >limits from the users interactive limits.
> >
> >>This is completely independent of overcommit or rlimits (which can cause
> >>the OOM behavior to happen more often, though in more predictable and
> >>perhaps less drastic fashion).
> >
> >It is a way to prevent OOM from occuring at the system level.
>
> Not a practical one for most systems, though.
>
> >>I'm a scientific app programmer, not a system programmer... but I'll try
> >>to code up a simple daemon in the next couple days that works in
> >>conjunction with a modified version of Rik van Riel's patch.
> >
> >Watch out for the kernel level accounting that is needed.
>
> It isn't going to happen any time soon.
>
>
>
> James.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:33 EST