Re: automatic routing in 2.2.*

From: Robert Manning (robertm@stargate.teklords.com)
Date: Mon Mar 27 2000 - 23:43:27 EST

Next message: ncm@nospam.cantrip.org: "Is ReiserFS really a journaling file system, or is it really just a synchronous-metadata file system like BSD FFS?"
Previous message: Todd: "nfs and fragments"
In reply to: Paul Jakma: "Re: automatic routing in 2.2.*"
Next in thread: Paul Jakma: "Re: automatic routing in 2.2.*"
Reply: Paul Jakma: "Re: automatic routing in 2.2.*"
Reply: Jean-Christian de Rivaz: "Re: automatic routing in 2.2.*"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> which still doesn't explain why we need this 'kernel knows better'
> behaviour in the case of using ifconfig.
>
> > -d
There is a design issue of IP routing implementations that is perhaps
misunderstood here.
And it's a design issue within kernel routing tables.

I want to address this, because I've had to explain this to sooo many
people, that I hope this will save someone else having to answer the same
question again, and again in the future.

By definition: a network is defined by a single ip address and netmask
combination. (Unless you are also a cost-based router, more about that
later..).

There is no way around this, as it is basic to what defines an IP network.
The fact that you brought up an interface with both an IP address and a
netmask
does 2 things:
1.) It gives definition to a network of machines.
The number of machines, and the upper and lower limits of the address range
are defined by the sheer act of creating such an interface,
by applying the mask you gave to the address you gave.
2.) It makes the interface an (immediate, direct) member of that network.

Attempts to do otherwise defy the very simple logic which defines IP
routing within a machine.

Even if you use of an all bits ones mask, you have defined a "network of one"
machine. -In such a network, the network number, the broadcast address,
and the host address are all compressed into a single address.
-But still, a route is required for the kernel's routing mechanism to
understand
where to deliver a packet, even if it is later determined that the route is
to a local address and the packets are shunted to the loopback mechanism for
delivery).

Interface configuration utilities (such as ifconfig) which implement IP are
correct to add an interface route because to do anything else is inconsistent
with IP's definition of a network.
Historically, NOT adding the route has only served to confuse the operator
into
believeing that any route he would calculate could be stuffed in and
work (properly), when quite the opposite is true.
(To fully understand this, you need to look at radix tree designs, but I am
trying to keep this readable for those who don't want to). ;)
Some quite nasty, tangled, partially-dysfunctional routing webs (tables)
have been weaved by admins on systems where the interface routes were not
auto-added based on IP's routing design. -Such an environment has led to
networks
which appeared to conform to nobody's set of rules, even to the point of
network routes would overlap each other, and all manner of unexpected
(self-induced) routing problems would occur, ther outcome of which would be
largely dependent on which of the offending duplicate/overlapping routes
was added first/last.

Some early versions of ifconfig would fail to calculate the derived route from
the
user given IP and netmask, and auto-add it into the routing table,
and thus were part of the problem, (after fixing this, they are part
of the solution).
(The "problem" being one of people who didn't fully understand
the basic definition of an IP network, and all that bringing an interface into
such a network truly implies...and thus would spend much wasted time
and effort fumbling thier way through invalid routing configurations, until
finally asking some advice of someone more experiemced, and finding their
way back to "IP + netmask = <definition of a range of addresses known
collectively as a network, and yea, you just joined it>.
Now, if you "join" an interface to a network, then by all logic, you
MUST have a route to THAT network through THAT interface.
(Or you really didn't join that interface into that network)...

All interfaces join a network, and get a route.
-Whether they have joined a network of ONE (netmask 255.255.255.255)
or MANY (any other netmask) makes no difference, they have still joined one.

Note that until an implementation implements "equal-cost multipath" support
into it's kernel routing tables, then the ONLY criteria for deciding which
route an IP packet will take, is to take the packets destination IP
(from the packet header), and search the radix tree (kernels routing table
structure) for the "most specific" match.
(In the case of a "tie" (a very *bad thing*) wherein TWO or more interfaces
have been defined into the same network,
- the action to be taken in routing the outbound packet to it's destination
is implementation-specific, and undefined by any standards, and the results are
unpredictable)
-although, what USUALLY happens, is that either the first, or the last
interface
added that duplicates the network ends up winning the route).
(whether it's first or last depends on how/where the node is added to the
radix tree,
and again later on how well the radix tree is searched, and whether a duplicate
route search is bothered with, and then what final decision is made in the case
of complete duplicates: (same network number, and same netmask, but 2
interfaces
decide: turn left, turn right, eenie,meenie,minee,mo...Hmmmmm....).
An unfortunate side-effect, but this is what always happens, because this type
of use (ie: duplicate routes) was never originally intended in IP.

BTW: This is also an effective method of "black-holing" a route, (creating the
route to an entire network on the loopback, -currently in use by routers
at large, and the gated routing daemon on Unix machines).
And as the term implies, all packets which enter therein, appear in another
universe, never to be seen in this one again.

This probably explains what has happened to your eth0 routes, when lo0 was
created into the same exact network. (Thereby duplicating all of the available
criteria by which a decision on how to route the packets to that network
can be made -woops, we turned left, shit...too late...).

A side-effect of this is that the "netstat" walk of the kernel routing
table would fail to find the duplicate routes, -leading one to beleve that
they were not there, (even though, in truth they are, but one is "covering up"
the others because of the way the walk through the radix trees is done).
The proof was in the pudding: On such an OS, you could remove the interface
of the route
that netstat -rn showed that network belonged to, then run the netstat -rn
command again and see what interface the route "moved" to....
(What "really happened is this: - the route NEVER MOVED...it way always there,
you have just uncovered it, by removing the duplicate route that was covering
up the others existance when the radix tree / netmasks were searched).
(because the final tie-breaker -the netmask was equal for both).

So, the action of defining multiple interfaces into the same network
( or "subnet", in pre-CIDR terms ) is largely incorrect, unless you
specifically
use a third criteria that will be used to determine the "tie-breaker" situation
that you have created.

Some imbedded operating systems, (used in routers) do implement this third
criteria,
(known largely as "cost") directly on interfaces, so they have an additional
determining
factor to route packets by...
(as in gated/OSPF, route with the lowest cost to same destination wins).
In addition, they implement a round-robin feature (for a rough implementation
of
load-balancing) should even the costs of the two-or-more interfaces be found
to
be equal.
They may even have algorithms for "true" load balancing along "equal-cost
multipath"
routes based on bandwidth availability. but then so much horsepower is needed
to
make up for the time spent in constantly recalculating this, that CPU
bandwidth
can quickly become a determining factor (clients comparison-shop Unixen for
raw speed)
and many win sales based on which is the "fastest"....therefore adding such
an implementation into the kernel which consumes more CPU, for what appears to
be a fringe market looking for special features, can cause the OS to lose
market share).
(what a shame, but true).

Pure routers whose entire CPU can be dedicated to the tasks of routing, may
not need
address such concerns as nearly as full operating systems - which are expected
to
reserve CPU for user-space applications (databases, data mining, etc.)

So, if you are going to need to use equal-cost multipath and/or true bandwidth
availability based load-balancing algorithms (the only legitimate reasons you
might have
for configuring multiple interfaces into the same subnet) - then a dedicated
router
is what fits best here. (It's IOS doesn't have to meet the same SPECint design
criteria
for CPU consumption that a full blown multi-user OS is expected to).
In fact, routers tend to be over-blown with routing features, because they do
not
need to leave ANY CPU time for users...(what users?- there are no users on a
router,
so the "who-cares, it's all our CPU anyway" attitude is quickly adopted in
router
design work - giving them a distinct advantage in the bells-and-whistles
dept.).

If you want to configure a machine to respond to aliases in such a way that is
network independent, adding them to the loopback is an acceptable way of
doing this, but a netmask of all ones (/32) should be used, to make the route
more
specific than those of any competing network interfaces, and to make sure
you don't run into the black-hole mechanism which is used (by design) to
simplify
CIDR based routing schemes.

Additionally, there are many other penalties (other than address depletion)
to consider when using aliases, but I'll save that as it is not pertinent
here.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: ncm@nospam.cantrip.org: "Is ReiserFS really a journaling file system, or is it really just a synchronous-metadata file system like BSD FFS?"
Previous message: Todd: "nfs and fragments"
In reply to: Paul Jakma: "Re: automatic routing in 2.2.*"
Next in thread: Paul Jakma: "Re: automatic routing in 2.2.*"
Reply: Paul Jakma: "Re: automatic routing in 2.2.*"
Reply: Jean-Christian de Rivaz: "Re: automatic routing in 2.2.*"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:21 EST