Re: [PATCH 2/2] net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)

From: Russell King (Oracle)
Date: Wed Aug 30 2023 - 15:30:14 EST


On Wed, Aug 30, 2023 at 03:06:49PM +0200, Oleksij Rempel wrote:
> On Wed, Aug 30, 2023 at 01:35:18PM +0100, Russell King (Oracle) wrote:
> > On Wed, Aug 30, 2023 at 02:17:38PM +0200, Oleksij Rempel wrote:
> > > On Wed, Aug 30, 2023 at 01:51:51PM +0200, Lukasz Majewski wrote:
> > > > Hi Oleksij,
> > >
> > > > It looks like the most optimal solution would be the one proposed by
> > > > Tristam:
> > > > https://www.spinics.net/lists/netdev/msg932044.html
> > >
> > > In this case, please add the reason why it would work on this HW and
> > > will not break by any changes in PHYlib or micrel.c driver.
> > >
> > > If I remember it correctly, in KSZ9477 variants, if you write to EEE
> > > advertisement register, it will affect the state of a EEE capability
> > > register. Which break IEEE 802.3 specification and the reason why
> > > ksz9477_get_features() actually exist. But can be used as workaround if
> > > it is written early enough before PHYlib tried to read EEE capability
> > > register.
> > >
> > > Please confirm my assumption by applying your workaround and testing it
> > > with ethtool --show-eee lanX.
> > >
> > > It should be commented in the code with all kind of warnings:
> > > Don't move!!! We use one bug to workaround another bug!!! If PHYlib
> > > start scanning PHYs before this code is executed, then thing may break!!
> >
> > Why would phylib's scanning cause breakage?
> >
> > phylib's scanning for PHYs is about reading the ID registers etc. It
> > doesn't do anything until the PHY has been found, and then the first
> > thing that happens when the phy_device structure is created is an
> > appropriate driver is located, and the driver's ->probe function
> > is called.
> >
> > If that is successful, then the fewatures are read. If the PHY
> > driver's ->features member is set, then that initialises the
> > "supported" mask and we read the EEE abilities.
> >
> > If ->features is not set, then we look to see whether the driver
> > provides a ->get_features method, and call that.
> >
> > Otherwise we use the generic genphy_c45_pma_read_abilities() or
> > genphy_read_abilities() depending whether the PHY's is_c45 is set
> > or not.
> >
> > So, if you want to do something very early before features are read,
> > then either don't set .features, and do it early in .get_features
> > before calling anything else, or do it in the ->probe function.
>
> Let me summarize my view on the problem, so may be you can suggest a better
> way to solve it.
> - KSZ9477, KSZ8565, KSZ9893, KSZ9563, seems to have different quirks by
> the same PHYid. micrel.c driver do now know what exact HW is actually
> in use.
> - A set of PHY workarounds was moved from dsa/microchip/ksz9477.c to
> micrel.c, one of this workaround was clearing EEE advertisement
> register, which by accident was clearing EEE capability register.
> Since EEE cap was cleared by the dsa/microchip/ksz9477.c code before
> micrel.c was probed, PHYlib was assuming that his PHY do not supports
> EEE and dint tried to use it.
> After moving this code to micrel.c, it is now trying to change EEE
> advertisement state without letting PHYlib to know about it and PHYlib
> re enables it as actually excepted.
> - so far, only KSZ9477 seems to be broken beyond repair, so it is better
> to disable EEE without giving it as a choice for user configuration.

We do have support in phylib for "broken EEE modes" which DT could set
for the broken PHYs, and as it is possible to describe the DSA PHYs in
DT. This sets phydev->eee_broken_modes.

phydev->eee_broken_modes gets looked at when genphy_config_aneg() or
genphy_c45_an_config_aneg() gets called - which will happen when the
PHY is being "started".

So, you could add the DT properties as appropriate to disable all the
EEE modes.

Alternatively, in your .config_init function, you could detect your
flag and force eee_broken_modes to all-ones.

The problem with clearing ->supported_eee is that will stop
genphy_c45_write_eee_adv() writing the advertisement register -
which means if bits are set in the register, they won't be cleared
because phylib thinks the registers aren't supported. So you won't
actually be disabling anything by clearing ->supported_eee.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!