Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL][FIX TESTED] in vrf "bind - ns-B IPv6 LLA" test

From: Mirsad Goran Todorovac
Date: Thu Jun 15 2023 - 16:11:17 EST


On 6/14/23 10:47, Guillaume Nault wrote:
On Sat, Jun 10, 2023 at 08:04:02PM +0200, Mirsad Goran Todorovac wrote:
This also works on the Lenovo IdeaPad 3 Ubuntu 22.10 laptop, but on the AlmaLinux 8.8
Lenovo desktop I have a problem:

[root@pc-mtodorov net]# grep FAIL ../fcnal-test-4.log
TEST: ping local, VRF bind - ns-A IP [FAIL]
TEST: ping local, VRF bind - VRF IP [FAIL]
TEST: ping local, device bind - ns-A IP [FAIL]
TEST: ping local, VRF bind - ns-A IP [FAIL]
TEST: ping local, VRF bind - VRF IP [FAIL]
TEST: ping local, device bind - ns-A IP [FAIL]
[root@pc-mtodorov net]#

Kernel is the recent one:

[root@pc-mtodorov net]# uname -rms
Linux 6.4.0-rc5-testnet-00003-g5b23878f7ed9 x86_64
[root@pc-mtodorov net]#

Maybe a problem with the ping version used by the distribution.
You can try "./fcnal-test.sh -t ipv4_ping -p -v" to view the commands
run and make the script stop when there's a test failure (so that you
can see the ping output and try your own commands in the testing
environment).

Thank you for taking the time for the reply. And thanks for the hint.
But I am sort of on ebb tide on this.

It would be good to have the test run on both versions of Linux to test
the actual kernel faults. Maybe pack a version of ping command w the test?
But I cannot deploy too much time in this.

I hope then the upgrade AlmaLinux 8.8 -> 9.x (or CentOS clones in general)
would solve the issue, but it is not guaranteed, and I would lose bisect
to the old kernels. Which is why I do not upgrade to the latest releases
in the first place. :-/

If it is just the AlmaLinux ping, then it is just an exotic distro, but it
is a CentOS clone, so the issue might exist in the more popular Rocky, too.

I am not sure what is the right way to do in this case or I would already
have done it. Presumptuous maybe, but true.

However, I have a question:

In the ping + "With VRF" section, the tests with net.ipv4.raw_l3mdev_accept=1
are repeated twice, while "No VRF" section has the versions:

SYSCTL: net.ipv4.raw_l3mdev_accept=0

and

SYSCTL: net.ipv4.raw_l3mdev_accept=1

The same happens with the IPv6 ping tests.

In that case, it could be that we have only 2 actual FAIL cases,
because the error is reported twice.

Is this intentional?

I don't know why the non-VRF tests are run once with raw_l3mdev_accept=0
and once with raw_l3mdev_accept=1. Unless I'm missing something, this
option shouldn't affect non-VRF users. Maybe the objective is to make
sure that it really doesn't affect them. David certainly knows better.

The problem appears to be that non-VRF tests are being ran with
raw_l3mdev_accept={0|1}, while VRF tests w raw_l3mdev_accept={1|1} ...

The reason the VRF tests run twice is to test both raw and ping sockets
(using the "net.ipv4.ping_group_range" sysctl). It doesn't seem anyone
ever intended to run the VRF tests with raw_l3mdev_accept=0.

Only the non-VRF tests were intended to be tested with
raw_l3mdev_accept=0 (see commit c032dd8cc7e2 ("selftests: Add ipv4 ping
tests to fcnal-test")). But I have no idea why.

Well, you are not to blame if it is not documented.

This thing doesn't come out of the testsuite save by prayer and fasting,
I'm afraid ;-)

Best regards,
Mirsad