Re: [net-next PATCH RFC] net: dsa: qca8k: make learning configurable and keep off if standalone

From: Vladimir Oltean
Date: Tue Jul 04 2023 - 19:35:07 EST

Next message: Yin Fengwei: "Re: [PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance"
Previous message: Andrew Lunn: "Re: [PATCH 2/2] irqchip/irq-mvebu-sei: Use devm_platform_get_and_ioremap_resource()"
In reply to: Christian Marangi: "Re: [net-next PATCH RFC] net: dsa: qca8k: make learning configurable and keep off if standalone"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, Jul 01, 2023 at 08:25:11PM +0200, Christian Marangi wrote:
> I have bad and good news about this... Bad news is that it seems to be
> qca8k handle flooding in a very interesting and broken way.
> Due to DSA implementation we expect every packet to be flooded to the
> CPU port by default and this is problematic with how the switch works.

To be clear, the network stack understands "flooding" to be something
that takes place for packets that have an unknown MAC DA, or broadcast.
So no, we don't expect that every packet is flooded, nor that all are
flooded to the CPU port. I'm not exactly clear what you mean.

>
> It seems that enabling flood bit for the CPU results in packet always be
> directed there...

Are you sure? How have you determined this? Can you rephrase this?
Because what I'm understanding is that you're saying that when flooding
towards the CPU is enabled, packets go to the CPU regardless of whether
they're known to the FDB or not. Which can't be correct, because if it
was, then packets sent to a MAC DA statically added as a bridge FDB entry
would be seen twice:

- once, autonomously forwarded by the switch
- twice, due to your claim that "packets are always directed to the CPU",
(claim which must also hold with the current driver configuration,
because the driver currently always enables flooding to the CPU)

But if that's the case, it means that the CPU (via the software bridge)
will send a second copy of the packet to the egress port, because
skb->offload_fwd_mark is unset. Someone would have noticed if that was
the case ?!

I have some skepticism towards this claim.

Please try to describe what happens under this scenario:

br0
/ | \
/ | \
/ | \
swp0 swp1 swp2
| | |
A B C

# On the switch
ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
ip link set swp0 master br0 && ip link set br0 up
ip link set swp1 master br0 && ip link set br0 up
ip link set swp2 master br0 && ip link set br0 up
bridge fdb add dev swp1 ${mac_of_B} vlan 1 master static

# On C
tcpdump -i eth0 -e -n

# On A
ping $B

How many ICMP requests do you see on B? How many do you see on C?
How many on the CPU port (br0)?

Correct answers should be:
On B: as many as were sent by A
On C: none
On br0: none on ingress (tcpdump -Q in), none on egress (tcpdump -Q out)

If you're seeing the ICMP requests on station C, see whether they don't
in fact come from br0 having flooded them in software, which means that
the real problem is with the FDB not matching on packets..

I have no idea how you've implemented things, but with reserved VIDs,
you also need to remap VID 0 during port_fdb_add() calls to the reserved
PVID of that bridge, and you've done that, otherwise packets won't match,
right?

> The switch tagger have a way to comunicate that the packet is flooded
> to the cpu.
>
> With some code in the tagger I manage to reach this state from
> local_termination.sh
>
> root@OpenWrt:~# /ktests/run_kselftest.sh -t drivers/net/dsa:local_termination.sh
> TAP version 13
> 1..1
> # selftests: drivers/net/dsa: local_termination.sh
> # TEST: lan1: Unicast IPv4 to primary MAC address [ OK ]
> # TEST: lan1: Unicast IPv4 to macvlan MAC address [ OK ]
> # TEST: lan1: Unicast IPv4 to unknown MAC address [ OK ]
> # TEST: lan1: Unicast IPv4 to unknown MAC address, promisc [ OK ]
> # TEST: lan1: Unicast IPv4 to unknown MAC address, allmulti [ OK ]
> # TEST: lan1: Multicast IPv4 to joined group [ OK ]
> # TEST: lan1: Multicast IPv4 to unknown group [ OK ]
> # TEST: lan1: Multicast IPv4 to unknown group, promisc [ OK ]
> # TEST: lan1: Multicast IPv4 to unknown group, allmulti [ OK ]
> # TEST: lan1: Multicast IPv6 to joined group [ OK ]
> # TEST: lan1: Multicast IPv6 to unknown group [ OK ]
> # TEST: lan1: Multicast IPv6 to unknown group, promisc [ OK ]
> # TEST: lan1: Multicast IPv6 to unknown group, allmulti [ OK ]
> # TEST: br0: Unicast IPv4 to primary MAC address [ OK ]
> # TEST: br0: Unicast IPv4 to macvlan MAC address [ OK ]
> # TEST: br0: Unicast IPv4 to unknown MAC address [FAIL]
> # reception succeeded, but should have failed
> # TEST: br0: Unicast IPv4 to unknown MAC address, promisc [ OK ]
> # TEST: br0: Unicast IPv4 to unknown MAC address, allmulti [FAIL]
> # reception succeeded, but should have failed
> # TEST: br0: Multicast IPv4 to joined group [ OK ]
> # TEST: br0: Multicast IPv4 to unknown group [FAIL]
> # reception succeeded, but should have failed
> # TEST: br0: Multicast IPv4 to unknown group, promisc [ OK ]
> # TEST: br0: Multicast IPv4 to unknown group, allmulti [ OK ]
> # TEST: br0: Multicast IPv6 to joined group [ OK ]
> # TEST: br0: Multicast IPv6 to unknown group [FAIL]
> # reception succeeded, but should have failed
> # TEST: br0: Multicast IPv6 to unknown group, promisc [ OK ]
> # TEST: br0: Multicast IPv6 to unknown group, allmulti [ OK ]
> not ok 1 selftests: drivers/net/dsa: local_termination.sh # exit=1
>
> The tagger check if the packet type is a flood kind and drop the packet
> (return NULL) if the interface is not promisc or allmulti. If the port
> if bridge type this check is skipped...

Why is the check skipped if the port is under a bridge? Because there
might be foreign interfaces under that bridge (like wlan0) which might
be valid destinations for flooded traffic (through software forwarding)?

> My concern is... Is it ok to implement host_set_flood this way? I tried
> many variant and I wasn't able to make flood work by using those bits...
>
> With the current implementation, host_set_flood will return always zero
> and everything will be handled in the tagger by dropping packet but I'm
> not sure it's ok to do it.

Well, we should first clarify exactly what it is that you claim to be a
hardware bug.

Next message: Yin Fengwei: "Re: [PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance"
Previous message: Andrew Lunn: "Re: [PATCH 2/2] irqchip/irq-mvebu-sei: Use devm_platform_get_and_ioremap_resource()"
In reply to: Christian Marangi: "Re: [net-next PATCH RFC] net: dsa: qca8k: make learning configurable and keep off if standalone"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]