Re: RFC: fragmentation code experiment

Michael H. Warfield (mhw@wittsend.com)
Sat, 23 Oct 1999 20:52:15 -0400


On Sat, Oct 23, 1999 at 05:31:13PM -0600, Todd wrote:
> hi,

> i am working on a project related to the linux networking code and wanted
> comments both on the project, in general, and on methodology.

> here is the general set of premises:

> 1) the code related to fragmentation and (especially) reassembly and
> (especially) out-of-order reassembly is some of the more complicated and
> difficult-to-get-right code in the IP portion of the networking code.
> this code has, over the course of the past year or two, also been
> responsible for security (denial of service) and other problems.

As have other portions of code. Eliminating code just because
it "might have a bug" or "might be used for DoS" is not the right answer.
Fix it and get it right.

> 2) modern networks almost never produce out-of-order fragments for packets
> that are eventually accepted (this is a premise--needs to be tested).

Highly debatable and very unlikely to be true. Vaguries of
collisions and multiply connected routes and bridges would make your
assumption little better than a special case senario or a wild ass guess.

> 3) workstations and server that do no fragment reassembly or at least no
> out-of-order fragment reassembly would have simpler (and
> better-performing) network code.

First part, maybe. Second part, non-sequitar.

> 4) if fragment reassembly (out-of-order or not) needs to be done, it
> should be done by border routers (where people are increasingly willing to
> do computationally intensive stuff like filter and reassemble fragments).

No.

Ethernet is limited to 1500 MTU. Are you implying, then, that all
IP datagrams be limited to 1500 bytes. That would byte (sic) the big one.
NFS performance would be right in the toliet.

> create a log-processing script that correlates and counts the above kinds
> of events. in particular, count how many packets are received as
> out-of-order fragments but are eventually accepted (which i am assuming
> will almost never happen, but need to test).

I would assume that your assumption would be as rare as a purple
elephant. What I would REALLY be afraid of is that you would get some
people to test this out under circumstances where they would expect it
to be true (simply connected or flat networks) and create a self fufilling
prophicy that would commit random acts of terrorism in the real world.
Remember, in terms of worse case senarios, the opposite of something
that works all the time is not something that is broken. Things which
are broken are easy to trace and easy to fix. The opposite of working
is "sort of working, most of the time". That means the random and sporatic
times at which it doesn't work become support and debugging nightmares.
What you are proposing, even if it were to work 99% of the time, would
make lives miserable in the real world where things are neat and pretty
like you think.

I also have in mind the kind of random acts of terrorism that
would result when you mix this sort of idea with IP on ATM, which is
already subject to enough random acts of terrorism. Since ATM can
discard a cell at will and those cells are relatively small compared
to the length of a datagram fragment, ATM routers which are not specially
equiped to deal with IP can cause all sorts of problems when the traffic
level rises. You reach a point where the probability of losing a cell
somewhere in a fragment reaches 1:1 and the IP through-put collapses
catastrophicly because you loose the entire fragment due to the loss
of a single cell. "Smart" ATM routers have the logic to recognize IP
datagrams and to discard the entire datagram if they have to discard even
a single cell (since you would end up retransmitting the entire datagram
anyways). You would be throwing gasoline on that fire by adding the
requirement that all fragments arrive in order for reassembly. Ain't no
way, no how.

This is real world networking right now. And this isn't
getting any better. Fragment reassembly is intended for that layer.
Eliminating that is merely going to shift the problem to other places
which were not intended to deal with it.

An analogy... Years ago there were a couple of competing views
about "error recovery" in both TCP/IP and SPX/IPX (Novell). One view argued
that "why should we have checksums in IP (IPX) when the lower layers
would catch errors and guarentee that what we got was good?" The other
camp argued that "why should we checksum/crc the lower (MAC) layers when
the upper layers would catch any problems?" All sides argued over
"performance". Why double the checking when all it did was cost you
"performance". In the TCP/IP arena, no one dared suggest eliminating
the IP checksums but we did end up with SLIP which has no error detection
and recovery on the serial link layer. OTOH, the Novell side we ended up
with the checksums in IPX (which originated with Xerox and did have checksums)
eliminated in the name of performance. Finally the day came... Someone
proposed running IPX over SLIP... Hmmmm..... SLIP has no error checking
and depends on the upper layers. IPX has no error checking and depends on
the lower layers.... FIRE IN DA HOLE!

Your idea may look like it has some advantages on its surface. Down
deep, it is deep trouble that would create reliability problems and
environment dependent support problems. What's even worse is that you
may have the misfortune to get away with it for a while.

> i would be interested in suggestions about how to best implement this
> (including any guidance for adding a sysctl; i have read the documentation
> and some of the code, but i don't understand it yet; also interesting
> questions: does anyone agree with my premises about the state of the
> fragment code or have other perspectives). i'd also be interested in
> other general thoughts about the project.

Impliment by way of nearest circular file.

> thanks, in advance for comments on this,

> todd underwood
> todd@unm.edu

Mike

-- 
 Michael H. Warfield    |  (770) 985-6132   |  mhw@WittsEnd.com
  (The Mad Wizard)      |  (770) 331-2437   |  http://www.wittsend.com/mhw/
  NIC whois:  MHW9      |  An optimist believes we live in the best of all
 PGP Key: 0xDF1DD471    |  possible worlds.  A pessimist is sure of it!

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/