Re: [PATCH net] bonding: reduce rtnl lock contention in mii monitor thread

From: Jarod Wilson
Date: Fri Dec 18 2020 - 18:50:58 EST


On Tue, Dec 8, 2020 at 3:35 PM Jay Vosburgh <jay.vosburgh@xxxxxxxxxxxxx> wrote:
>
> Jarod Wilson <jarod@xxxxxxxxxx> wrote:
...
> >The addition of a case BOND_LINK_BACK in bond_miimon_commit() is somewhat
> >separate from the fix for the actual hang, but it eliminates a constant
> >"invalid new link 3 on slave" message seen related to this issue, and it's
> >not actually an invalid state here, so we shouldn't be reporting it as an
> >error.
...
> In principle, bond_miimon_commit should not see _BACK or _FAIL
> state as a new link state, because those states should be managed at the
> bond_miimon_inspect level (as they are the result of updelay and
> downdelay). These states should not be "committed" in the sense of
> causing notifications or doing actions that require RTNL.
>
> My recollection is that the "invalid new link" messages were the
> result of a bug in de77ecd4ef02, which was fixed in 1899bb325149
> ("bonding: fix state transition issue in link monitoring"), but maybe
> the RTNL problem here induces that in some other fashion.
>
> Either way, I believe this message is correct as-is.

For reference, with 5.10.1 and this script:

#!/bin/sh

slave1=ens4f0
slave2=ens4f1

modprobe -rv bonding
modprobe -v bonding mode=2 miimon=100 updelay=200
ip link set bond0 up
ifenslave bond0 $slave1 $slave2
sleep 5

while :
do
ip link set $slave1 down
sleep 1
ip link set $slave1 up
sleep 1
done

I get this repeating log output:

[ 9488.262291] sfc 0000:05:00.0 ens4f0: link up at 10000Mbps
full-duplex (MTU 1500)
[ 9488.339508] bond0: (slave ens4f0): link status up, enabling it in 200 ms
[ 9488.339511] bond0: (slave ens4f0): invalid new link 3 on slave
[ 9488.547643] bond0: (slave ens4f0): link status definitely up, 10000
Mbps full duplex
[ 9489.276614] bond0: (slave ens4f0): link status definitely down,
disabling slave
[ 9490.273830] sfc 0000:05:00.0 ens4f0: link up at 10000Mbps
full-duplex (MTU 1500)
[ 9490.315540] bond0: (slave ens4f0): link status up, enabling it in 200 ms
[ 9490.315543] bond0: (slave ens4f0): invalid new link 3 on slave
[ 9490.523641] bond0: (slave ens4f0): link status definitely up, 10000
Mbps full duplex
[ 9491.356526] bond0: (slave ens4f0): link status definitely down,
disabling slave
[ 9492.285249] sfc 0000:05:00.0 ens4f0: link up at 10000Mbps
full-duplex (MTU 1500)
[ 9492.291522] bond0: (slave ens4f0): link status up, enabling it in 200 ms
[ 9492.291523] bond0: (slave ens4f0): invalid new link 3 on slave
[ 9492.499604] bond0: (slave ens4f0): link status definitely up, 10000
Mbps full duplex
[ 9493.331594] bond0: (slave ens4f0): link status definitely down,
disabling slave

"invalid new link 3 on slave" is there every single time.

Side note: I'm not actually able to reproduce the repeating "link
status up, enabling it in 200 ms" and never recovering from a downed
link on this host, no clue why it's so reproducible w/another system.

--
Jarod Wilson
jarod@xxxxxxxxxx