Re: Overcommitable memory??

From: Jesse Pollard (pollard@tomcat.admin.navo.hpc.mil)
Date: Mon Mar 13 2000 - 09:32:30 EST


David Whysong <dwhysong@physics.ucsb.edu>:
> On 13 Mar 2000, Rask Ingemann Lambertsen wrote:
> >
> >Apps would be told that the system is out of memory instead of just
> >getting a SIGKILL'ed out of the blue sky. Apps getting NULL from
> >malloc() can react appropriately, such as saving your files to disk,
> >trying again a little later or just exiting if that is acceptable for
> >what the app was doing. Apps getting SIGKILL will take your unsaved work
> >with them in the fall.
>
> Ok, so my big gravitational simulation gets NULL from malloc and decides
> to save it's work and exit. Uh-oh, time to demand-load a page of
> executable code that had been discarded, so we can save the data. Hmm, but
> we're out of memory...

Thats why the subject of resource control keeps re-surfacing. It takes the
ability to have resource accounting to be able to identify that which
is violating some usage policy. Without the accounting data, the only way
to choose is nearly random.

> Even if that succeeds, or there is a foolish "no overcommit" policy, we
> need disk buffers. What if the program was told to save output to a SCSI
> device, and the kernel needs to load the driver module? We're out of
> memory! Even if we all build non-modular kernels, the kernel does some
> dynamic memory allocation.

Ummm, It was my impression that the kernel did not discard drivers as long
as there was an open file using the file system.

In ether case, if I were depending on it, then I would load the driver
explicitly at boot time, and leave it there permanently.

As far as "no overcommit" being foolish - the current implementation is
what you get whenever memory is over committed. Personally, I thing the
ability to overcommit should be optional. If I don't want to overcommit then
I disable it. Does that mean I have to have a bigger swap space ? sure, and
I accept it. Disks are cheap. An aborted 3 month process can be very expensive.

> As for "trying again a little later", that leaves you with an unresponsive
> and unusable system in many cases.
>
> And please explain why my simulation -- that may have started many weeks
> (or months) ago -- should "just exit" because some random 5-minute old
> Mathematica process went and allocated half a gigabyte of memory?

On many systems that I have used, I had the ability to put quota limits
on the amount of virtual memory a user could use. If the user exceeded
that amount, then the users processes (whichever one just asked for too much)
got aborted with the equivalent of a "virtual memory quota exceeded" signal.

This allowed me to allocate the swap space based on:

        u = maximum number of concurrent users
        q = quota space allocated to each user
        r = quota space for root

        swap space = u * q + r

There are other formulae - u * q may be a summation if q is different
for each user. At that point q would become the max of all users quotas.
Is this too much - sure, but there won't be any random aborts either.
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:24 EST