Re: arp, kernel 2.2.15 and 2.3.99-pre6

From: Julian Anastasov (uli@linux.tu-varna.acad.bg)
Date: Sun May 07 2000 - 06:54:24 EST


        Hello,

        I must note that after looking again in arpfilter I
see that it can't filter everyting. My example needs the
Linux 2.0 behavior. I have only one path to the
destinations, for example through eth0 and I don't want the
IP addresses from the other devices (dummy0) to be reported.
So, all these games with the routing don't work here. They
don't consider the local address.

        If arpfilter detects whether the "sip" is moved to
another device and replies only through this device my setup
needs the restriction for "tip" too.

        So my requirements for the arpfilter as replacement
are:

- reply only through the device where the route points to
(currently implemented)

- announce only IP addresses configured on this device as
source of the ARP request (not implemented)

        Well, I can't solve the problem with the shared
addresses without the hidden device feature.

        If someone explain me how the arpfilter will solve
the following problem I will be happy:

                ROUTER
                192.168.0.1
                  |LAN
        +---------+-------------+
        |192.168.0.2 |192.168.0.3
        Director Real server

VIP=192.168.0.4

ROUTER
eth0: 192.168.0.1/24
eth1: link to the external clients

Director
eth0: 192.168.0.2/24
eth0:0 192.168.0.4/32
default gw: 192.168.0.1 through eth0 (ROUTER)

Real server:
eth0: 192.168.0.3/24
dummy0: 192.168.0.4/32 (must not be advertised, hidden=1)
default gw: 192.168.0.1 through eth0 (ROUTER)

The traffic:

Client: 10.0.0.1:1024 -> 192.168.0.4:80, received in the router

Router: who-has 192.168.0.4 tell 192.168.0.1

Director: 192.168.0.4 is at my eth0 MAC
#Real server must not reply to the router's broadcast
#because 192.168.0.4 is a shared address. This can't
#be achieved from arpfilter because the route to
#192.168.0.1 (the ROUTER) is never changed. The only
#difference is the device where the VIP is configured.
#If the real server replies here we have a mess -
#the Router starts talking to one of the real server!

Router: 10.0.0.1:1024 -> 192.168.0.4:80, the packet comes in
the Director

The Director resolves the MAC for 192.168.0.3 because
the LVS config is:
Forward TCP 192.168.0.4:80 to 192.168.0.3:80
Forward TCP 192.168.0.4:80 to 192.168.0.5:80 Real server 2
Forward TCP 192.168.0.4:80 to 192.168.0.6:80 Real server 3 ...
# Note the port and proto specifications. Not whole
# traffic to 192.168.0.4 is forwarded, only for the
# specified service

10.0.0.1:1024 -> 192.168.0.4:80 - The packet is received in
the real server and the connection is accepted, i.e. from
10.0.0.1->192.168.0.4. There is no source address
selection. This is an incoming connection to the
192.168.0.4:80 service.

When the request is processed the real server replies:
#"who-has 192.168.0.1 tell 192.168.0.4" - this is the wrong
#broadcast request if arp_solicit doesn't change the saddr!
who-has 192.168.0.1 tell 192.168.0.3 - this is the right
request, here we select the primary address from the
outgoing device. I.e. we resolve the next hop (ROUTER) here.

These are the steps. I don't think arpfilter helps here, at
least the current version. This is my understanding.

On Sun, 7 May 2000, Andrey Savochkin wrote:

> Hello,
>
> On Sat, May 06, 2000 at 08:50:38AM +0300, Julian Anastasov wrote:
> > On Sat, 6 May 2000, Andrey Savochkin wrote:
> > > What autoselection are you speaking about?
> >
> > Sorry. This is the source address autoselection.
>
> OK. The next question: which source address?
> Source for ARP requests for forwarding packets?
 
        Yes. This is a MUST.

> Source for locally originated IP packets sent from unbound socket and going
 
        Yes. This is a MAY because we have the previous
rule. But better not to play and to set this MUST too. It
depends on the fact whether a Director forwards the incoming
traffic.

>
> through routes without pref_src? Or whatever else source?
> Please keep in mind that inet_select_addr()
> controls both of the source selections mentioned above.

        You are right. Everything! We try not to use shared
addresses in the communications if possible. Because LVS is
usually not configured to forward traffic to each
protocol/port for the VIP.

        The hidden feature is intended to work in
environments with shared addresses. We can rename it to
"shared" :)
 
> > The check in inet_select_addr for IN_DEV_HIDDEN. By the way
> > I'm still not sure why this check is in the loop. It can be
> > moved before the for(), for example:
> >
> > if (IN_DEV_HIDDEN(in_dev))
> > continue;
> > for_primary_ifa(in_dev) {
> > if (ifa->ifa_scope <= scope &&
> > ifa->ifa_scope != RT_SCOPE_LINK)
> > return ifa->ifa_local;
> > } endfor_ifa(in_dev);
> >
> > Alexey, is there a reason the check to be in the
> > in_dev loop?
>
> Sorry, I don't see the point of any modifications for inet_select_addr.
> IN_DEV_HIDDEN is an absolute alien here!

        You can ask Alexey for this change. But OK, I will
answer you. It is there just to avoid ARP requests "who-has
THIS_HIDDEN_IP tell ME" in the future. If you autoselect
shared IP address to communicate with other host on the LAN
after your first outgoing packet you will see broadcast ARP
request "who-has THIS_HIDDEN_IP tell remote_end". In my
example when the ARP replies are filtered this request will
be answered from the Director only (where they are not
filtered). So, as a result the real server (where the IP
address is hidden because it is shared) _MUST_ not
autoselect such shared addresses to communicate with the
other hosts on the LAN. The other end will not talk with
the real server but with the Director. This is because we
advertise the shared address only in one host. For
example, this setup is working in my example (all 3 hosts
on the LAN including the client):

CLIENT Director Web
  | | |
  +-------------+---------------+

        Don't forget that CLIENT sends all traffic to the
Director because the VIP is advertised only in the Director.
The Web routes the traffic directly to the CLIENT. The rule
here is that the real servers can use the shared IP address
to talk to anyone because there is a Director who will
deliver the incoming traffic. This is possible because this
Director uses a table (just like the MASQ box) to remember
the connections to each of the real servers. This is the
only rule which allows one real server to use shared/hidden
IP address in the communications. This is the reason we
restrict hidden addresses not to be autoselected in
inet_select_addr. Because the Director can balance only
specific ports (selected from the administrator) and if the
real server chooses this shared IP to talk from port which
is not balanced in the Director this communication will be
reseted from the Director - there is no such TCP/UDP port
configured in the LVS and the packet is not forwarded to
the real server, it is delivered locally with all
consequences.

        As a result "The shared addresses are not
automatically selected". The connection from the Client to
the Real server (to the hidden IP) is established only if
the initial request is forwarded from the Director. In
environment with shared IP addresses the access is
controlled from the Director. The Client can't communicate
directly to the real server using this hidden IP addresses
as a destination address.
 
> > > I do not understand you well.
> > > Are you speaking about the problem that you send packets with VIP source but
> > > want to use different IP address in the ARP request headers?
> >
> > Exactly. I want specific local addresses not to be
> > announced as source of the ARP request. They are shared
>
> Fine.
>
> > > Are you suggesting to stop to use skb source for ARP requests and use only
> > > inet_select_addr() or fib_select_addr() or dedicated "ARP request" addresses?
> >
> > Yes, we fallback to {inet,fib}_select_addr in such
> > case.
> >
> > > In general case it increases the ARP traffic on the link, but it's perfectly
> > > ok for me if this behavior may be turned on and off.
> >
> > The ARP request is sent in any case. We only
> > restrict some local addresses not to be used as source of
> > the broadcast request, i.e. the saddr is changed only.
>
> I was speaking about ARP request in the opposite direction (from packet
> receiver to sender under normal circumstances). That's why I think an option
> to control the 2 variants of the behavior (skb source with fallback to
> {inet,fib}_select_addr, or just {inet,fib}_select_addr).

        Fine. If we continue in this direction we can use
address 0.0.0.0 as source of the ARP request. This variant
is working too :) This is a DaD! Even without
IN_DEV_DEFAULT_ARP_SRC we can restrict saddr always to be
from the outgoing device's primary addresses ignoring saddr
in the skb. Is this a problem? That was the old behavior
before 2.2, I think. If this can't be controlled from
arpfilter it can be controlled from default_arp_src or
hidden. But the ARP replies are not filtered enough from
arpfilter, so the hidden feature can't be replaced.

> > > Does Andi's filters plus this change of arp_solicit() policy solve all the
> > > problems for you network configurations?
> >
> > Yes. This is possible. But my understanding is that
>
> Great.

        I was wrong. If I understand correctly with the
current arpfilter version it is not possible the replies to
be filtered because the path from the real server to the
router is not changed. It seems this feature follow the
route changes only and don't care for the requested IP
address (tip).
 
> > we can have two variants to control the interfaces:
> >
> > 1. Change the flag for the interface where the ARP requests
> > come from (arpfilter) and don't reply for IP addresses not
> > on this interface.
> >
> > 2. Change the flag for the interface for which you don't
> > reply its IP addresses (hidden) through the other ARP
> > devices.
>
> IP addresses have only minimal relations with devices, if have any.
> Addresses are the matter of IP routing, devices are the matter of packet
> transmission/reception. Certainly, checking the device flags in address

        Agree. Tell it to the web clusters who care for
hundreds of VIPs (virtual domains). The easiest way to
configure shared addresses is on the dummy/lo devices. The
other way by using routing primitives is to tell the kernel
VIP_net/VIP_mask -> 0.0.0.0/0 is hidden for example. But
how this works with the shared address environment? That
doesn't work for all levels. For ARP these addresses are
hidden but for IP they are not.

        I agree. This is one big hack. But I don't know
easier way to support shared addresses. May be other people
have more ideas? If this feature raises problems the user
can decide not use it. I don't see problem here.

> lookup routine (fib_local_source) is a strange thing.
> Any sane kernel should work if there are several independent IP networks are
> configured on the same media as well if there is only one network.

        I don't change fib_local_source, I only clone it as
fib_local_arp_source which is used only from arp_solicit.
 
> Andi's patch essentially do the following: it introduce a per-device flag
> whether to apply some route-based rules to check whether to reply to the ARP
> request. This concept is ok. Moreover, it's the only acceptable concept in
> this area I've heard. Do you have any objections against the proposed
> implementation of the route-based check (arp_filter routine itself)? That's
> the only place where different solutions may exist.

        Well, I already said that such feature is useful.
But it doesn't solve the problems with the shared IP
addresses. It solves other problems but I'm not sure if it
is tested, I think problems with ARP requests can occur if
we don't control saddr in arp_solicit. I'm not sure. I have
to rethink it.
 
> [snip]
> > ... Is
> > it possible fib_local_source to be replaced in arp_solicit
> > with call to a similar function which uses fib_lookup but
> > restricts the local address to be from the output device?
> [snip]
> > memset(&key, 0, sizeof(key));
> > key.src = daddr;
> > key.dst = saddr;
> > key.tos = tos;
> > key.iif = dev->ifindex;
> > in_dev = in_dev_get(dev);
> > if (IN_DEV_ARPFILTER(in_dev))
> > key.oif = dev->ifindex;
> > in_dev_put(in_dev);
> > ret = -EINVAL;
> > if (fib_lookup(&key, &res) == 0) {
> > if (res.type == RTN_LOCAL) {
> > in_dev = in_dev_get(FIB_RES_DEV(res));
> > if (!in_dev) goto out;
> > if (!IN_DEV_HIDDEN(in_dev))
> > ret = 0;
> > in_dev_put(in_dev);
> > }
> > out:
> > fib_res_put(&res);
> > }
>
> It's very wrong, because you're starting to toss with
> per-device settings in the place where we hopefully reduced the question to
> the routing table examination.

        OK, these statements are not needed:
 
        key.src = daddr;
        key.tos = tos;

        and this code must be:

        if (dev == FIB_RES_DEV(res) || !IN_DEV_HIDDEN(in_dev))

        And now the result is:

int fib_local_arp_source(u32 saddr, struct net_device *dev)
{
        struct rt_key key;
        struct fib_result res;
        struct in_device *in_dev;

        memset(&key, 0, sizeof(key));
        key.dst = saddr;
        in_dev = in_dev_get(dev);
        if (IN_DEV_ARPFILTER(in_dev))
                key.oif = dev->ifindex;
        in_dev_put(in_dev);
        ret = -EINVAL;
        if (fib_lookup(&key, &res) == 0) {
                if (res.type == RTN_LOCAL) {
                        if (dev == FIB_RES_DEV(res)) {
                                ret = 0;
                                goto out;
                        }
                        in_dev = in_dev_get(FIB_RES_DEV(res));
                        if (!in_dev) goto out;
                        if (!IN_DEV_HIDDEN(in_dev))
                                ret = 0;
                        in_dev_put(in_dev);
                }
                out:
                fib_res_put(&res);
        }
        /*
        ** Here we must use something like inet_select_addr
        ** to select local address which is
        ** not hidden if ret!=0
        */
        return ret;
}

        By this way arpfilter works as default_arp_src. But
this can't replace the hidden feature because other things
are filtered from arpfilter.

> So, is it true that Andi's patch together with something like the
> patch below allows you to solve your problems?

        No I'm correcting myself. It can't work. The filter
doesn't stop the replies.
        
        My setup with LVS doesn't use inet_select_addr for
the upper layer protocols and the real servers will not
raise ARP requests from the remote end. Please consider that
if the ARP replies will go through one device only when
arpfilter is ON and the remote's ARP requests continue to
come from the wrong interface (where we don't reply) that
means some transfer is blocked. I'm not sure if some
problems will not be introduced.
 
> --- net/ipv4/arp.c.orig Sun May 7 11:55:03 2000
> +++ net/ipv4/arp.c Sun May 7 11:55:03 2000
> @@ -330,13 +330,16 @@
> u32 saddr;
> u8 *dst_ha = NULL;
> struct net_device *dev = neigh->dev;
> + struct in_device *in_dev = in_dev_get(dev);
> u32 target = *(u32*)neigh->primary_key;
> int probes = atomic_read(&neigh->probes);
>
> - if (skb && inet_addr_type(skb->nh.iph->saddr) == RTN_LOCAL)
> + if (skb && (in_dev == NULL || !IN_DEV_DEFAULT_ARP_SRC(in_dev)) &&
> + inet_addr_type(skb->nh.iph->saddr) == RTN_LOCAL)
> saddr = skb->nh.iph->saddr;
> else
> saddr = inet_select_addr(dev, target, RT_SCOPE_LINK);
> + in_dev_put(in_dev);
>
> if ((probes -= neigh->parms->ucast_probes) < 0) {
> if (!(neigh->nud_state&NUD_VALID))
>
> inet_select_addr() may be replaced by fib_select_address() here for more
> flexibility. But no device flag checks inside! :-)

        Something near. What about:

- if (skb && inet_addr_type(skb->nh.iph->saddr) == RTN_LOCAL)
+ if (skb && (in_dev != NULL && !IN_DEV_DEFAULT_ARP_SRC(in_dev)) &&
+ inet_addr_type(skb->nh.iph->saddr) == RTN_LOCAL)
                saddr = skb->nh.iph->saddr;
        else
                saddr = inet_select_addr(dev, target, RT_SCOPE_LINK);
+ if (in_dev) in_dev_put(in_dev);

        The code can be optimized. Can we replace
IN_DEV_DEFAULT_ARP_SRC with IN_DEV_ARPFILTER?

        We must call inet_select_addr if:
        
- in_dev is NULL
- dev's arpfilter/default_arp_src is 1
- skb saddr is not local

        But don't kill the alien in inet_select_addr because
if some protocol selects address with inet_select_addr it is
possible blocked local IP addresses to be selected. This is
a possible problem for the arpfilter. This is a problem for
any policy which blocks ARP replies. But with the "hidden"
feature it is solved.

        Any variant without the alien in inet_select_addr
is not working. Without the hidden feature the source
address selection must be handled from the arpfilter. And
here is the problem: this is not possible! The other problem
is that arpfilter doesn't mute for the addresses not in the
related device.

        I think, the hidden feature can't be replaced with
arpfilter. It solves other problems. I don't have more ideas
about replacing the hidden feature. Now when I explained the
usage and the problems that this feature solves if someone
with a better knowledge can find a better solution I will be
happy.

        The goal is to support shared addresses on the LAN.
My opinion is that arpfilter can coexist with hidden but
can't replace it. Some code can be optimized but I don't
know your plans, which functions will be replaced, etc. I
think, the hidden feature exactly solves the problem. Not
everything can be solved with routing rules. And I prefer
these addresses to be configured on separate device because
there is no other way to distinguish the routes in my
example (192.168.0.4->192.168.0.1):

Real server:
eth0: 192.168.0.3/24
dummy0: 192.168.0.4/32

192.168.0.4 is on the both networks! I.e. it is in the local
and in the main table. The only difference is the device: it
is not on one device only! The other trick is to define a
network on the "lo" interface as local and to hide it. Very
big hacks but the number of rules is reduced :)

        I will appreciate a better definition of the
problem solving, now when all details are known. I don't
know what you consider wrong.

        All you complain about the feature. But I don't
understand what are the real problems with the current
implementation.

Regards

--
Julian Anastasov <uli@linux.tu-varna.acad.bg>

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun May 07 2000 - 21:00:20 EST