Re: How to increat [sic.] max open files?

Richard B. Johnson (root@analogic.com)
Sat, 4 Jan 1997 10:26:25 -0500 (EST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Kimon Berlin: "oops, Re: netperf, was Re: too much untested code in new kernels"
Previous message: Todd T. Fries: "xd.c patch"
In reply to: Jason Burrell: "Re: modutils snapshot for 970103"
Next in thread: Neptho T: "Re: quick question about modules in the kernel"
Reply: Neptho T: "Re: quick question about modules in the kernel"

On Sat, 4 Jan 1997, Richard Gooch wrote:

> Richard B. Johnson writes:
> > On Fri, 3 Jan 1997, Baldur Norddahl wrote:
> > [SNIPPED]
> > > On Fri, 3 Jan 1997, Richard B. Johnson wrote:
> > [SNIPPED]
> > >
[SNIPPED]
>
> Well, that's one approach: do it all in the kernel (let's hear it
> for kernel bloat). Another school of thought is that things which
> don't *have* to be done in the kernel are done in userspace. This is
> particularly true if the kernel is not multi-threaded. If you push the
> work for managing large numbers of connections into the kernel, then
> that may give an application an "unfair" advantage over other
> applications which are CPU bound. By pushing the work into userland,
> it give the kernel the opportunity to context switch the process out
> and give someone else a go.

Very true! That's one of the reasons why I don't think a server should be
a "workstation" at the same time.

>
> > Of course we will always have user-mode programmers who think that they
> > can make better code than the kernel code, but you should know how that
> > works.
>
> That assumes the correct approach requires the absolute tightest
> code, preferably written in assembler, and so forth. See below.
>
> > When user code has to keep track of many "sockets" it usually has to look
> > through a list (perhaps linked) of things to be done once some event
> > (such as an inquiry from a socket-connected client), It can't just use
> > socket values as indexes because clients disconnect and new out-of-order
> > sockets are assigned for new connections.
>
> Most *traditional* approaches to socket management employ a linked
> list where the sockets are listed in order. In that case, a sequential
> search is required, which can be quite painful for large lists. Now,
> quoting your belief that better algorithm design is the correct
> approach, here are few improvements:
>
> 1) construct an array of (void *) entries. The FD is an index into
> this array. Very fast lookup. A NULL value indicates no entry,
> otherwise it's a pointer to your socket management structure. It's
> irrelevant whether or not sockets close: just dellocate the structure
> and set the pointer to NULL. When the OS gives you a FD which is
> bigger than your array length, reallocate the array to twice it's
> size. Simple
>
> 2) Instead of a simple linked list, use a _binary tree_! Wow. New
> concept. When you return from select(2), walk down the tree and pick
> up your socket management structure. A mere 20 iterations for a
> million FDs. Lots of lost cycles there
>
> > Once the list becomes large, much time is wasted just getting to the
> > code that is going to service the request. There might even be a context-
> > switch or two before your application actually does anything useful as
> > far as the client is concerned.
>
> I don't see why you're so convinced that a process managing thousands
> of FDs is inefficient. You talk about better algorithm design, and yet
> you don't seem to consider a few simple, efficient approaches to
> solving the problem in userland. Anyone who has large lists to
> maintain has had to solve this problem.
>
Note that we are both saying the same thing. I think we are just disagreeing
upon how to say it.

> > Now, suppose your code used a different "port" (after some initial
> > negotiation), for each Client. Then suppose your code wasn't even
> > executed until the kernel gave you control with the port (read index),
> > already found.
> >
> > Don't you think that this would be a more efficient way to handle the
> > stereotypical Client/Server methodology?
>
> Nope. Your example is wasteful of system resources. Particularly
> RAM. It costs kernel memory (RAM) for each process/task. Say it costs
> 512 bytes for an entry in the process table. Let's say the process
> limit on FDs is 256. Say each connection requires two FDs (the
> connection and perhaps an open disc file). That limts a process to 128
> connections. To support 20 000 connections, you need 157
> processes. This takes nearly 80 kBytes of RAM; memory which can't be
> swapped. This number is nearly doubled on 64 bit machines. On top of
> this you need to add the RAM it takes for each FD.
> Also it will take the kernel time to walk though the runnable process
> list. Then there's added overheads dealing with the page table.
>
The initial assumption was (is) that the kernel is more efficient at this
than user-mode code.

> > Now, this is just one example. It is not a good example but it is one
> > that is easy to understand. Another example is the simple telnet daemon.
[SNIPPED]
> > ... no longer efficient because of the wasted overhead. Note that the
telnet
> > example could be accessing a database or serving files instead of being
> > a terminal server to a shell.
>
> Did you know that Solaris 2 has a kernel-level "telnetd" for the
> express reason of reducing the number of processes on the system?
> Because hundreds of "telnetd" processes load the system. Each of those
> "telnetd" processes do comparatively little work (compared to the
> shell the user is running). A single process/task can do the work of
> hundreds, leaving the kernel so schedule the more important jobs:
> users' shells.

Linux now has "nfsiod"..... sorta like, but for NFS..(separate tasks!!)

The fact that Solaris does something just might mean that it's wrong. I
have dealt with 6 years of SunBugs, every release making the machines slower
and slower and slower and ....

> They've learnt that too many processes is a bad thing.

They probably have learned nothing.

> Regards,
> Richard....
Thanks....

Cheers,
Dick Johnson
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Richard B. Johnson
Project Engineer
Analogic Corporation
Voice : (508) 977-3000 ext. 3754
Fax : (508) 532-6097
Modem : (508) 977-6870
Ftp : ftp@boneserver.analogic.com
Email : rjohnson@analogic.com, johnson@analogic.com
Penguin : Linux version 2.1.20 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Next message: Kimon Berlin: "oops, Re: netperf, was Re: too much untested code in new kernels"
Previous message: Todd T. Fries: "xd.c patch"
In reply to: Jason Burrell: "Re: modutils snapshot for 970103"
Next in thread: Neptho T: "Re: quick question about modules in the kernel"
Reply: Neptho T: "Re: quick question about modules in the kernel"