Re: bonding: link state question

From: Jonathan Toppins
Date: Sun Aug 08 2021 - 21:31:47 EST


On 8/8/21 12:49 AM, Willy Tarreau wrote:
On Sat, Aug 07, 2021 at 08:09:31PM -0400, Jonathan Toppins wrote:
setting miimon = 100 does appear to fix it.

It is interesting that there is no link monitor on by default. For example
when I enslave enp0s31f6 to a new bond with miimon == 0, enp0s31f6 starts
admin down and will never de-assert NO-CARRIER the bond always results in an
operstate of up. It seems like miimon = 100 should be the default since some
modes cannot use arpmon.

Historically when miimon was implemented, not all NICs nor drivers had
support for link state checking at all! In addition, there are certain
deployments where you could rely on many devices by having a bond device
on top of a vlan or similar device, and where monitoring could cost a
lot of resources and you'd prefer to rely on external monitoring to set
all of them up or down at once.

I do think however that there remains a case with a missing state
transition in the driver: on my laptop I have a bond interface attached
to eth0, and I noticed that if I suspend the laptop with the link up,
when I wake it up with no interface connected, the bond will not turn
down, regardless of miimon. I have not looked closer yet, but I
suspect that we're relying too much on a state change between previous
and current and that one historically impossible transition does not
exist there and/or used to work because it was handled as part of
another change. I'll eventually have a look.

Willy


I am likely very wrong but the lack of a recalculation of the bond carrier state after a lower notifies of an up/down event seemed incorrect. Maybe a place to start?

diff --git i/drivers/net/bonding/bond_main.c w/drivers/net/bonding/bond_main.c
index 9018fcc59f78..2b2c4b937142 100644
--- i/drivers/net/bonding/bond_main.c
+++ w/drivers/net/bonding/bond_main.c
@@ -3308,6 +3308,7 @@ static int bond_slave_netdev_event(unsigned long event,
*/
if (bond_mode_can_use_xmit_hash(bond))
bond_update_slave_arr(bond, NULL);
+ bond_set_carrier(bond);
break;
case NETDEV_CHANGEMTU:
/* TODO: Should slaves be allowed to