Re: Some questions about linux kernel.

From: James Sutherland (jas88@cam.ac.uk)
Date: Sun Mar 19 2000 - 16:05:09 EST


On Sun, 19 Mar 2000 14:23:00 -0600, you wrote:
>On Sun, 19 Mar 2000, James Sutherland wrote:
>>On Fri, 17 Mar 2000 18:05:17 -0600, you wrote:
>>>On Thu, 16 Mar 2000, James Sutherland wrote:
>>>>On Thu, 16 Mar 2000, Paul Jakma wrote:
>>>>> On Thu, 16 Mar 2000, James Sutherland wrote:
>>>>>
>>>>> > No. *ANY* memory allocation system can run out of memory. Avoiding
>>>>> > "overcommitting" would make the OOM situation arise SOONER (and more
>>>>> > frequently), as well as killing performance.
>>>>> >
>>>>>
>>>>> well, it's a more a question of whether you make promises that you might
>>>>> not be able to keep. If you do (ie overcommit) then it's your
>>>>> (kernel) problem. If you don't, it's not.
>>>>
>>>>Not really; either way, we are being asked for memory we can't currently
>>>>provide.
>>>>
>>>>> Without overcommit the /system/ can run out memory, of course - it's
>>>>> finite - but it's no longer a kernel problem.
>>>>
>>>>It was never really a "kernel problem" as such anyway. It's a problem the
>>>>kernel is trying to fix.
>>>
>>>And where it couldn't detect the beginning of the problem.
>>
>>The beginning of the problem? The problem is that we don't have enough
>>VM - every malloc() call since the system booted contributes.
>
>That is a philosophical answer and is unusable.
>
>The beginning of the problem is process/process group that is allocating more
>than their alotted quota.

They may well not even *HAVE* a quota!

>>>>> > Right... Now we'll try this on the university's central Unix system, shall
>>>>> > we? Let's see... 6000 users, 2Gb RAM+swap. They get about 300K each.
>>>>> > That's ALMOST enough to log in with!
>>>>>
>>>>> well then get more ram/swap. But at least it has become a hw issue.
>>>>
>>>>It was all along - you don't have enough RAM+swap for the workload the
>>>>system is under, so processes are dying.
>>>
>>>The problem is detecting when it is running out of memory.
>>
>>It's fairly obvious, IMO - the "free memory" numbers get low. malloc()
>>calls start failing. Processes start dying.
>
>Too late - it has already run out. "when it is running out" is before
>it has already run out.

Memory starts off 100% free, and then goes down. At some point, you
should take some action; that's up to the system administrator.

What we were discussing here is what should be done if we actually
reach 0%, because whatever precautions were in place have failed.

>>(snip)
>>>Thats why per user and per process resource limits need to be implemented.
>>
>>Yes - but what's this got to do with overcommit?
>
>1. Controling overcommit
You misunderstand. Overcommit has nothing to do with quotas.

>2. Identifing the process using too much
>3. preventing that process from killing the system
As done by Rik's patch.

>4. making it possible to determine how much is to be added to the system.
Impossible to calculate without knowledge of the application in use.

>>(snip)
>>>>Yes, I agree. However, we can still run out of RAM+swap on multiuser
>>>>systems quite easily.
>>>
>>>Only if they are mismanaged - allowed to have too many concurrent users,
>>>with too many concurrent processes. I am part of a center that runs
>>>UNIX systems that don't have this problem. The per user resource limits
>>>are taken into account for the number of concurrent interactive users,
>>>and the number of batch queues (with concurrent jobs). The total amount
>>>of required resources is then allocated. There IS no OOM on this system.
>>>Can't happen.
>>
>>No, you have just isolated the users from each other a little better
>>than normal. In doing so, you have placed very severe restrictions on
>>their use of the system; this wouldn't be tolerated here.
>>
>>OK, once in a blue moon, a user's rogue process blows up and grabs
>>half the system's VM. It then gets killed by Rik's patch. No problems
>>at all, other than the system being slowed down a bit.
>
>It may not be a user's rogue process - it may be working exactly as it should,
>loading up a large data set.

It is still "rogue" in that it is doing something it shouldn't. We
still need to kill it.

>Unfortunately - that doesnt always work - it is better than random kill, but
>not to management. It is a stopgap work around that doesn't help identify
>who, what, and where the memory is being taken.

That's what "top" etc. are for.

>>>>> > No, it's a risk with *EVERY* OS.
>>>>>
>>>>> no it's not. A non overcommiting OS doesn't run out of VM for
>>>>> processes. it cleanly grants or denies memory requests. What the app
>>>>> does after that is not a kernel problem.
>>>>
>>>>The problem is, denying memory requests leads to processes dying. This is
>>>>what we want to avoid.
>>>
>>>No it isn't. What you want to avoid is system crashes/reboots.
>>
>>If your system crashes or reboots just because a program has taken all
>>the VM, it is a very poor system indeed. Even Windows NT doesn't do
>>this.
>
>But linux does...

In which case, there is something seriously wrong with Linux.

>>> - If a user
>>>process chooses to exceed its' resource limits then that process is aborted. The
>>>system doesn't crash. The process to abort is clearly known.
>>
>>Exactly what happens with Rik's patch anyway - only the limits are set
>>high enough to allow the user to use more memory, *if* this can be
>>done without any problems.
>
>It isn't complete - I need to be able to control the memory allocation. I
>need to prevent the system from crashing.

The system shouldn't crash from OOM situations anyway. If Linux does
that, stick it in the bitbucket and look for a fixed version.

>>>>In fact, last night I more or less killed this machine. I had overcommit
>>>>turned off, and nothing major loaded (X, WWW server, xmms) and fired up
>>>>"make -j" on a biggish source tree. After quite some time, just about
>>>>everything died from lack of memory.
>>>
>>>too bad - user resource limits would have signaled the out of resources to
>>>the make process (it should have been given the OOM signal when it forked/execed
>>>the compiler/linkger). This could then be used by make to determine that
>>>resource limits had been reached, and not to spawn more. Yes this does
>>>require make to be modified. Otherwise the make would have aborted; BUT
>>>the system would not fail.
>>
>>The system didn't fail - but the user's other processes all died due
>>to a lack of memory. Setting a lower VM quota for myself would just
>>have made the problem worse - all my processes would have died
>>earlier.
>
>But the system would have continued to run. If that user can justify expanding
>the resources, then management can allocate money/time/disk space to expand
>the limited resource.

The system continues to run anyway. The user is only using SPARE
resources, not allocating new ones.

>>>>This was WITHOUT the OOM killer patch. With it, I think the newly spawned
>>>>make and cc processes would have been killed off much sooner, helping save
>>>>the other processes.
>>>>
>>>>
>>>>How would you define "running out" of memory? I would define it as not
>>>>being able to fulfil new requests - which is the situation we are trying
>>>>to handle here. If you start denying malloc() requests, processes start
>>>>dying (largely at random). If, instead, you kill selected processes, then
>>>>obviously processes still die as a result. However, they are more likely
>>>>to be the "right" ones to kill.
>>>
>>>running out of memory: insufficient resources available to a specific
>>>user to continue running processes.
>>
>>>Denying malloc() requests should not "largly at random" cause processes
>>>to die; A very specific process will die - the one making the sbrk system
>>>call. A different user would not see a problem since they should have a
>>>different set of resources. The general system daemons would not see a
>>>problem either - they also would have a different set of reserved resources.
>(I should have expanded this from "sbrk" to "sbrk/fork")
>>
>>Fine provided you have enough system resources - which you don't
>>always. This is where Rik's patch is needed. Once the system has run
>>out of VM, ANY process making a malloc() call will be denied the
>>request, for obvious reasons - and will then die as a result.
>
>If I don't have enough resources, then I expect the system it limit the
>damage to the user asking for too much.

Currently impossible with Linux, since it lacks a concept of "user" at
this level.

> No other users should be affected.
>The user can then request management to allocate/obtain more.

This presupposes "management" has some control over resources; under
Linux, they haven't.

>>OK, you can often restrict the process death to one particular user -
>>but that user would still have random processes dying, instead of the
>>biggest memory hog.
>
>It is not random. A specific user/process is killed. The system doesn't die
>the system doesn't reboot, the other usrs of the system are not affected.

A specific process is killed WITH Rik's patch. Without it, RANDOM
processes die instead.

Without introducing a concept of "users" into the memory management
properly, the damage CANNOT be confined to a specific user - there is
no specific user to blame, just a specific process.

>If the biggest memory hog is authorized the amount it desires, then it
>should get it. That is determined by management, not the OS.

If management sets rlimits up properly, then yes, this is the case.
This level of micromanagement is very rare, though. Besides, by the
time Rik's patch comes into effect, it is too late for management to
do anything anyway.

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:27 EST