Re: [RFC net-next 2/3] net: dsa: qca8k: enable assisted learning on CPU port

From: Vladimir Oltean
Date: Tue Aug 10 2021 - 19:34:11 EST


On Tue, Aug 10, 2021 at 11:09:07PM +0200, Andre Valentin wrote:
> Am 10.08.21 um 19:53 schrieb Vladimir Oltean:
> > On Tue, Aug 10, 2021 at 07:27:05PM +0200, Andre Valentin wrote:
> >> On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote:
> >>> On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
> >>>> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
> >>>>> Enable assisted learning on CPU port to fix roaming issues.
> >>>>
> >>>> 'roaming issues' implies to me it suffered from blindness to MAC
> >>>> addresses learned on foreign interfaces, which appears to not be true
> >>>> since your previous patch removes hardware learning on the CPU port
> >>>> (=> hardware learning on the CPU port was supported, so there were no
> >>>> roaming issues)
> >>
> >> The issue is with a wifi AP bridged into dsa and previously learned
> >> addresses.
> >>
> >> Test setup:
> >> We have to wifi APs a and b(with qca8k). Client is on AP a.
> >>
> >> The qca8k switch in AP b sees also the broadcast traffic from the client
> >> and takes the address into its fdb.
> >>
> >> Now the client roams to AP b.
> >> The client starts DHCP but does not get an IP. With tcpdump, I see the
> >> packets going through the switch (ap->cpu port->ethernet port) and they
> >> arrive at the DHCP server. It responds, the response packet reaches the
> >> ethernet port of the qca8k, and is not forwarded.
> >>
> >> After about 3 minutes the fdb entry in the qca8k on AP b is
> >> "cleaned up" and the client can immediately get its IP from the DHCP server.
> >>
> >> I hope this helps understanding the background.
> >
> > How does this differ from what is described in commit d5f19486cee7
> > ("net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign
> > bridge neighbors")?
> >
> I lost a bit, It is a bit different.
>
> I've been also working a bit on the ipq807x device with such a switch on
> OpenWRT. There is a backport of that commit. To fix the problems described
> by d5f19486cee7, I enabled assisted_learning on qca8k. And it solves the
> problem.
>
> But initially, the device was unreachable until I created traffic from the device
> to a client (cpu port->ethernet). At first, I did not notice this because a wifi client
> with it's traffic immediately solved the issue automatically.
> Later I verified this in detail.
>
> Hopefully DENG Qingfang patches help. But I cannot proove atm.

I don't understand. You're saying that when the device sends a packet
from its new position, the switch learns it on the CPU port, and that
fixes the issue?

Isn't that always how issues like that get fixed? If hardware learning
is supported on the CPU port, it is no different than a device roaming
from one switch port to another (but isn't directly connected to that
switch port, otherwise the switch might fast age the port when the
device roams) - it is inaccessible until it says something.

I still have no idea what we're talking about, and why this patch is
necessary. Does the qca8k switch support hardware learning on the CPU
port or not?