Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL][FIX TESTED] in vrf "bind - ns-B IPv6 LLA" test

From: Guillaume Nault
Date: Fri Jun 09 2023 - 12:14:48 EST


On Thu, Jun 08, 2023 at 07:37:15AM +0200, Mirsad Goran Todorovac wrote:
> On 6/7/23 18:51, Guillaume Nault wrote:
> > On Wed, Jun 07, 2023 at 12:04:52AM +0200, Mirsad Goran Todorovac wrote:
> > > [...]
> > > TEST: ping local, VRF bind - ns-A IP [ OK ]
> > > TEST: ping local, VRF bind - VRF IP [FAIL]
> > > TEST: ping local, VRF bind - loopback [ OK ]
> > > TEST: ping local, device bind - ns-A IP [FAIL]
> > > TEST: ping local, device bind - VRF IP [ OK ]
> > > [...]
> > > TEST: ping local, VRF bind - ns-A IP [ OK ]
> > > TEST: ping local, VRF bind - VRF IP [FAIL]
> > > TEST: ping local, VRF bind - loopback [ OK ]
> > > TEST: ping local, device bind - ns-A IP [FAIL]
> > > TEST: ping local, device bind - VRF IP [ OK ]
> > > [...]
> >
> > I have the same failures here. They don't seem to be recent.
> > I'll take a look.
>
> Certainly. I thought it might be something architecture-specific?
>
> I have reproduced it also on a Lenovo IdeaPad 3 with Ubuntu 22.10,
> but on Lenovo desktop with AlmaLinux 8.8 (CentOS fork), the result
> was "888/888 passed".

I've taken a deeper look at these failures. That's actually a problem in
ping. That's probably why you have different results depending on the
distribution.

The problem is that, for some versions, 'ping -I netdev ...' doesn't
bind the socket to 'netdev' if the IPv4 address to ping is set on that
same device. The VRF tests depend on this socket binding, so they fail
when ping refuses to bind. That was fixed upstream with commit
92ce8ef21393 ("Revert "ping: do not bind to device when destination IP
is on device"") (https://github.com/iputils/iputils/commit/92ce8ef2139353da3bf55fe2280bd4abd2155c9f).

Long story short, the tests should pass with the latest upstream ping
version.

Alternatively, you can modify the commands run by fcnal-test.sh and
provide the -I option twice: one for setting the device binding and one
for setting the source IPv4 address. This way ping should accept to
bind its socket.

Something like (not tested):

- run_cmd ping -c1 -w1 -I ${VRF} ${a}
+ run_cmd ping -c1 -w1 -I ${VRF} -I ${a} ${a}
[...]
- run_cmd ping -c1 -w1 -I ${NSA_DEV} ${a}
+ run_cmd ping -c1 -w1 -I ${NSA_DEV} -I ${a} ${a}

> However, I have a question:
>
> In the ping + "With VRF" section, the tests with net.ipv4.raw_l3mdev_accept=1
> are repeated twice, while "No VRF" section has the versions:
>
> SYSCTL: net.ipv4.raw_l3mdev_accept=0
>
> and
>
> SYSCTL: net.ipv4.raw_l3mdev_accept=1
>
> The same happens with the IPv6 ping tests.
>
> In that case, it could be that we have only 2 actual FAIL cases,
> because the error is reported twice.
>
> Is this intentional?

I don't know why the non-VRF tests are run once with raw_l3mdev_accept=0
and once with raw_l3mdev_accept=1. Unless I'm missing something, this
option shouldn't affect non-VRF users. Maybe the objective is to make
sure that it really doesn't affect them. David certainly knows better.

> Thanks,
> Mirsad
>