Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

From: Russell King (Oracle)
Date: Sun Mar 24 2024 - 07:40:00 EST


On Sun, Mar 24, 2024 at 12:47:08PM +0300, Arınç ÜNAL wrote:
> On 21/03/2024 18:31, Florian Fainelli wrote:
> > On 3/21/24 09:09, Arınç ÜNAL wrote:
> > > I have started testing MT7531 with EEE enabled and immediately experienced
> > > frames that wouldn't egress the switch or improperly received on the link
> > > partner.
> > >
> > > SoC MAC       <-EEE off-> MT7531 P6 MAC (acting as PHY)
> > > MT7531 P0 MAC <-EEE on -> MT7531 P0 PHY
> > > MT7531 P0 PHY <-EEE on -> Computer connected with twisted pair
> >
> > OK, so this is intended to describe that the SoC's Ethernet MAC link to the integrated switch did not use EEE only the user-facing ports. That makes sense because it's all digital logic and you are not going to be seeing much power saving from having EEE enabled between the SoC's Ethernet MAC and CPU port of the switch, that said, however, I wonder if this has an impact on any form of flow control within the switch that is reacting to LPI and you need EEE to be enabled end-to-end?
>
> I've tested pinging between my computers with EEE enabled interfaces. The
> behaviour is identical.
>
> >
> > >
> > > I've tested pinging from the SoC's CPU. Packet capturing on the twisted
> > > pair computer showed very few frames were being received.
> > >
> > > # ping 192.168.2.2
> > > PING 192.168.2.2 (192.168.2.2): 56 data bytes
> > > 64 bytes from 192.168.2.2: seq=36 ttl=64 time=0.486 ms
> > > ^C
> > > --- 192.168.2.2 ping statistics ---
> > > 64 packets transmitted, 1 packets received, 98% packet loss
> > > round-trip min/avg/max = 0.486/0.486/0.486 ms
> > >
> > > It seems there's less loss when frames are passed more frequently.
> >
> > That would point to an issue getting in and out of LPI, do you see these packet losses even with different LPI timeouts?
>
> The NICs on my computers don't seem to allow changing the tx-lpi and
> tx-timer options.
>
> Computer 1 (Intel I219-V, driver: e1000e):
>
> $ sudo ethtool --set-eee eno1 tx-timer 15
> netlink error: Invalid argument
>
> $ sudo ethtool --show-eee eno1
> EEE settings for eno1:
> EEE status: enabled - active
> Tx LPI: 17 (us)
> Supported EEE link modes: 100baseT/Full
> 1000baseT/Full
> Advertised EEE link modes: 100baseT/Full
> 1000baseT/Full
> Link partner advertised EEE link modes: 100baseT/Full
> 1000baseT/Full
>
> Computer 2 (Realtek RTL8111H, driver: r8169):
>
> $ sudo ethtool --set-eee eno1 tx-lpi on
>
> $ sudo ethtool --show-eee eno1
> EEE settings for eno1:
> EEE status: enabled - active
> Tx LPI: disabled
> Supported EEE link modes: 100baseT/Full
> 1000baseT/Full
> Advertised EEE link modes: 100baseT/Full
> 1000baseT/Full
> Link partner advertised EEE link modes: 100baseT/Full
> 1000baseT/Full
>
> I've tested with switch ports interfaces' tx-timer from 0 to 40, same
> tx-timer for both interfaces. Loss is still there.

EEE implementations tend to be a mess in the way drivers implement the
API, so one can't at the moment rely on what ethtool says about the
status. Sadly, this is what happens when driver authors are left to
their own ends. :(

> I suppose the MT7531 switch PHYs need calibration for EEE that is currently
> missing from the mediatek-ge driver.

EEE is quite simple from the software point of view. There is software
negotiation of the modules that EEE supports, and then there is are
one or more timers that affect the behaviour of EEE. The LPI timer is
"how long the link needs to be idle for before _this_ end signals that
it _can_ enter low power state". The link only enters low power state
when *both* ends of the link signal that they can enter low power
state.

What calibration would be necessary?

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!