Re: [PATCH v8 net-next 0/2] mv88e6xxx: Add MAB offload support

From: netdev
Date: Tue Nov 15 2022 - 07:40:54 EST


On 2022-11-15 13:22, Vladimir Oltean wrote:
On Tue, Nov 15, 2022 at 12:31:59PM +0100, netdev@xxxxxxxxxxxxxxxxxxxx wrote:
It happens on upstart, so I would then have to hack the system upstart to
add trace.

Hack upstart or disable the service that brings the switch ports up, and
bring them up manually...

I also have:
mv88e6085 1002b000.ethernet-1:04: switch 0x990 detected: Marvell 88E6097/88E6097F, revision 2
mv88e6085 1002b000.ethernet-1:04: configuring for fixed/rgmii-id link mode
mv88e6085 1002b000.ethernet-1:04: Link is Up - 100Mbps/Full - flow control off
mv88e6085 1002b000.ethernet-1:04 eth10 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:00] driver [Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth6 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:01] driver [Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth9 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:02] driver [Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth5 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:03] driver [Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth8 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:04] driver [Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth4 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:05] driver [Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth7 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:06] driver [Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth3 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:07] driver [Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth2 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:08] driver [Marvell 88E1112] (irq=174)
mv88e6085 1002b000.ethernet-1:04 eth1 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:09] driver [Marvell 88E1112] (irq=175)

after this and adding the ifaces to the bridge, it continues like:

br0: port 1(eth10) entered blocking state
br0: port 1(eth10) entered disabled state
br0: port 2(eth6) entered blocking state
br0: port 2(eth6) entered disabled state
device eth6 entered promiscuous mode
device eth10 entered promiscuous mode
br0: port 3(eth9) entered blocking state
br0: port 3(eth9) entered disabled state
device eth9 entered promiscuous mode
br0: port 4(eth5) entered blocking state
br0: port 4(eth5) entered disabled state
device eth5 entered promiscuous mode
br0: port 5(eth8) entered blocking state
br0: port 5(eth8) entered disabled state
device eth8 entered promiscuous mode
br0: port 6(eth4) entered blocking state
br0: port 6(eth4) entered disabled state
mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch
mv88e6085 1002b000.ethernet-1:04: port 0 failed to add 9a:af:03:f1:bd:0a vid 1 to fdb: -110

Dumb question, but if you get errors like this, how can you test anything at all
in the patches that you submit?

The answer is that I don't always get these errors... once in a while (maaany resets) it does
not happen, and all is fine.

The error code is... well of course -110 (timed out).


device eth4 entered promiscuous mode
br0: port 7(eth7) entered blocking state
br0: port 7(eth7) entered disabled state

I don't know if that gives ay clues...?

Not really. That error might be related - something indicating a breakage
in the top-level (fec IIUC) MDIO controller, or not. There was "recent"
rework almost everywhere. For example commit 35da1dfd9484 ("net: dsa:
mv88e6xxx: Improve performance of busy bit polling"). That also hooks
into the mv88e6xxx cascaded MDIO controller (mv88e6xxx_g2_smi_phy_wait),
so there might be something there.


I can check that out, but I remember that net-next has not worked on this device for quite some
time...


Otherwise I have to take more time to see what I can dig out. The easiest
for me is then to add some printk statements giving targeted information if told what and
where...

Do you have a timeline for when the regression was introduced?
Commit 35da1dfd9484 reverts cleanly, so I suppose giving it a go with
that reverted might be worth a shot. Otherwise, a bisect from a known
working version only takes a couple of hours, and shouldn't require
other changes to the setup.

I can't say when the regression was introduced as I used modified kernels, but something
between 5.16 and 5.17, I know there was something phy related, but it's a bit more complicated,
so it is only a guess...

I would have to get the whole locked port patch set etc. on a 5.16 to see if that works.