[discussion] Swap overcommitment recovery

david (david@kalifornia.com)
Sun, 16 Aug 1998 23:55:01 -0700


At the console or away from the console, userland has lower priority than
kernel land. Using the most common recent trigger: netscape. By the time
you notice what is happening, your mouse is no longer moving except once
every few minutes. The onset of this is nearly instant and in most cases
results in a requirement to reboot.

- user runs xkill and attempts to kill (randomly wild guess that netscape
was responsible) netscape and finally gives up 10 minutes later because
xkill hasn't run yet, tries ctrl-alt-del which won't run either for XX
minutes: reboot

- magic key to kill X leaves the video mangled and normally no means of
recovering it: reboot

- magic key response is very lagged, time for response grows, user becomes
quite annoyed and simply hits big red: reboot

- userland software watchdog misses it's mark due to evergrowing load: reboot

Reboots simply are not acceptable. Especially if you have unsaved work on
your desktops. In our current implementation, swapping is handled quite
nicely until we reach that point where our overcommitment comes into play.
in a split second, our system is now exceptionally unresponsive to
control. Some of you are going to say "just get scsi" and ignore it.
This is not an acceptable solution. It only makes the system a pinch more
responsive.

Here are some of the ideas that have been tossed into the fray:

- userland daemon
- kernel thread
- let the user deal with it

Well...let's go over these. A userland daemon sits under the same
restrictions as all the other processes. If it's pages aren't available,
it's going to wait until it has them. Once they are paged in, it has to
decide that action has to be taken. It suffers by sharing timeslices.
The benefit of this is that it is easily configurable via
/etc/somefile.conf and keeps it outside of the kernel's code so none of us
have to worry about it. It is not a good solution however because it's
intent is to police the scheduling and kill bad processes and it is
hindered by the processes it is policing.

Think Mr. Policeman with a foam baton giving out parking tickets in his
little orange blinking light scooter.

Now let's discuss implementing a means to accomplish this in the kernel.
The kernel is top doggie here and gets his bone however he wants it. The
kernel can kill at whim put to sleep at whim. The disadvantage of
implementing this in the kernel is the configurability issue. This
solution is preferable to most people who simply want "just kill
something". But what is "something?"

Ok. I think letting the user deal with it is fairly covered up top.
Let's toss out some ideas that have/have not been discussed.

- kill the biggest ram sucker
- kill the program requesting a page right now
- start killing user programs until the condition clears leading to:
- start killing root programs ... leading to:
- init is the only thing left, something must be really wrong, now is the
time to attempt disable writes, sync, mount RO, call our internal reboot
methods.
- intelligently kill the process(es):
- requesting large memory regions
- requesting lots of new pages in quick succession
- accomplishing many forks/execs in quick succession
- put all processes to sleep and notify a userland identified process of
the situation and let that process decide what to do based on what the
user has in a configuration file. note, this process -must- have all
it's pages previously allocated or it too is simply part of the problem.

I feel this last entry in the list adheres to the best interests of all.
Such a program will need to register itself with the kernel as the process
to hand control over to when this overcommitment condition excites.

Such a program can also be made a module which would make the kernel
interface with it much easier.

Overcommitment can come in many forms. Future denial of service attacks
of this form can be minimised by proactively limiting the possibility.
Given a few minutes, all of you can think of several ways to start sucking
up resources. Perhaps a Bad User (tm) opens a sending socket, binds to a
socket, and starts flooding his recv socket with very large sending
packets. but he doesn't read them from his listening socket all that
quickly and starts forking in a tree method while creating obscenely large
amounts of files with obscenely large filenames (yes, filenames have a
limit as do several of these attacks).

Now granted that an admin should have set up resource limits, but....let's
assume a new root exploit has come out as they tend to every other day on
bugtraq. So. A new machine killer is born and more admins rip out their
hair.

...or we can use netscape ;)

Further discussion of how to really tackle this problem are welcomed.
Religious diatribe should be taken elsewhere, we're here to fix the
problem, not make a lot of dust. Please take the above and flesh it out
with ideas. Brainstorm it then whittle down the great ideas from the not
so spiffy ones.

-d

-- 
Look, look, see Windows 98.  Buy, lemmings, buy!   
(c) 1998 David Ford.  Redistribution via the Microsoft Network is prohibited.
 for linux-kernel: please read linux/Documentation/* before posting problems

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html