Re: [PATCH RFC net-next 0/3] net: vlan: fix bridge binding behavior and add selftests

From: Nikolay Aleksandrov
Date: Wed Aug 10 2022 - 04:54:41 EST


On 10/08/2022 06:11, Sevinj Aghayeva wrote:
> When bridge binding is enabled for a vlan interface, it is expected
> that the link state of the vlan interface will track the subset of the
> ports that are also members of the corresponding vlan, rather than
> that of all ports.
>
> Currently, this feature works as expected when a vlan interface is
> created with bridge binding enabled:
>
> ip link add link br name vlan10 type vlan id 10 protocol 802.1q \
> bridge_binding on
>
> However, the feature does not work when a vlan interface is created
> with bridge binding disabled, and then enabled later:
>
> ip link add link br name vlan10 type vlan id 10 protocol 802.1q \
> bridge_binding off
> ip link set vlan10 type vlan bridge_binding on
>
> After these two commands, the link state of the vlan interface
> continues to track that of all ports, which is inconsistent and
> confusing to users. This series fixes this bug and introduces two
> tests for the valid behavior.
>
> Sevinj Aghayeva (3):
> net: core: export call_netdevice_notifiers_info
> net: 8021q: fix bridge binding behavior for vlan interfaces
> selftests: net: tests for bridge binding behavior
>
> include/linux/netdevice.h | 2 +
> net/8021q/vlan.h | 2 +-
> net/8021q/vlan_dev.c | 25 ++-
> net/core/dev.c | 7 +-
> tools/testing/selftests/net/Makefile | 1 +
> .../selftests/net/bridge_vlan_binding_test.sh | 143 ++++++++++++++++++
> 6 files changed, 172 insertions(+), 8 deletions(-)
> create mode 100755 tools/testing/selftests/net/bridge_vlan_binding_test.sh
>

Hi,
NETDEV_CHANGE event is already propagated when the vlan changes flags,
NETDEV_CHANGEUPPER is used when the devices' relationship changes not their flags.
The only problem you have to figure out is that the flag has changed. The fix itself
must be done within the bridge, not 8021q. You can figure it out based on current bridge
loose binding state and the vlan's changed state, again in the bridge's NETDEV_CHANGE
handler. Unfortunately the proper fix is much more involved and will need new
infra, you'll have to track the loose binding vlans in the bridge. To do that you should
add logic that reflects the current vlans' loose binding state *only* for vlans that also
exist in the bridge, the rest which are upper should be carrier off if they have the loose
binding flag set.

Alternatively you can add a new NETDEV_ notifier (using something similar to struct netdev_notifier_pre_changeaddr_info)
and add link type-specific space (e.g. union of link type-specific structs) in the struct which will contain
what changed for 8021q and will be properly interpreted by the bridge. The downside is that we'll generate
2 notifications when changing the loose binding flag, but on the bright side won't have to track anything
in the bridge, just handle the new notifier type. This might be the easiest path, the fix is still in
the bridge though, the 8021q module just needs to fill in the new struct and emit the notification on
any loose binding changes, the bridge must decide if it should process it (i.e. based on upper/lower
relationship). Such notifier can be also re-used by other link types to propagate link-type specific
changes.

Both of these avoid any direct dependencies between the bridge and 8021q. Any other suggestions that
are simpler, avoid direct dependencies and solve the issue in a generic way would be appreciated.

Just be careful about introducing too much unnecessary processing because we
can have lots of vlan devices in a system.

Cheers,
Nik