Re: [PATCH 2/2] net: dsa: microchip: Provide Module 4 KSZ9477 errata (DS80000754C)

From: Lukasz Majewski
Date: Tue Aug 29 2023 - 08:39:43 EST


Hi Oleksij,

> Hi Lukasz,
>
> On Tue, Aug 29, 2023 at 01:24:29PM +0200, Lukasz Majewski wrote:
> > Hi Vladimir,
> >
> > > Hi Lukasz,
> > >
> > > On Tue, Aug 29, 2023 at 10:35:33AM +0200, Lukasz Majewski wrote:
> > > > Hi Vladimir,
> > > >
> > > > > On Fri, Aug 25, 2023 at 06:48:41PM +0000,
> > > > > Tristram.Ha@xxxxxxxxxxxxx wrote:
> > > > > > > > IMHO adding functions to MMD modification would
> > > > > > > > facilitate further development (for example LED setup).
> > > > > > > >
> > > > > > >
> > > > > > > We already have some KSZ9477 specific initialization done
> > > > > > > in the Micrel PHY driver under drivers/net/phy/micrel.c,
> > > > > > > can we converge on the PHY driver which has a reasonable
> > > > > > > amount of infrastructure for dealing with workarounds,
> > > > > > > indirect or direct MMD accesses etc.?
> > > > > >
> > > > > > Actually the internal PHY used in the
> > > > > > KSZ9897/KSZ9477/KSZ9893 switches are special and only used
> > > > > > inside those switches. Putting all the switch related code
> > > > > > in Micrel PHY driver does not really help. When the switch
> > > > > > is reset all those PHY registers need to be set again, but
> > > > > > the PHY driver only executes those code during PHY
> > > > > > initialization. I do not know if there is a good way to
> > > > > > tell the PHY to re-initialize again.
> > > > >
> > > > > Suppose there was a method to tell the PHY driver to
> > > > > re-initialize itself. What would be the key points in which
> > > > > the DSA switch driver would need to trigger that method?
> > > > > Where is the switch reset at runtime?
> > > >
> > > > Tristam has explained why adding the internal switch PHY errata
> > > > to generic PHY code is not optimal.
> > >
> > > Yes, and I didn't understand that explanation, so I asked a
> > > clarification question.
> >
> > Ok. Let's wait for Tristram's answer.
> >
> > >
> > > > If adding MMD generic code is a problem - then I'm fine with
> > > > just clearing proper bits with just two indirect writes in the
> > > > drivers/net/dsa/microchip/ksz9477.c
> > > >
> > > > I would also prefer to keep the separate ksz9477_errata()
> > > > function, so we could add other errata code there.
> > > >
> > > > Just informative - without this patch the KSZ9477-EVB board's
> > > > network is useless when the other peer has EEE enabled by
> > > > default (like almost all non managed ETH switches).
> > >
> > > No, adding direct PHY MMD access code to the ksz9477 switch
> > > driver is not even the biggest problem - even though, IIUC, the
> > > "workaround" to disable EEE advertisement could be moved to
> > > ksz9477_get_features() in drivers/net/phy/micrel.c, where
> > > phydev->supported_eee could be cleared.
> >
> > To be even more interesting (after looking into the PHY micrel.c
> > code):
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/micrel.c#L1804
> >
> > The errata from this patch is already present.
> >
> > The issue is that ksz9477_config_init() (drivers/net/phy/micrel.c)
> > is executed AFTER generic phy_probe():
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/phy_device.c#L3256
> > in which the EEE advertisement registers are read.
> >
> > Hence, those registers needs to be cleared earlier - as I do in
> > ksz9477_setup() in drivers/net/dsa/microchip/ksz9477.
> >
> > Here the precedence matters ...
> > >
> > > The biggest problem that I see is that Oleksij Rempel has "just"
> > > added EEE support to the KSZ9477 earlier this year, with an ack
> > > from Arun Ramadoss: 69d3b36ca045 ("net: dsa: microchip: enable
> > > EEE support"). I'm not understanding why the erratum wasn't a
> > > discussion topic then.
> >
> > +1
>
> As this erratum states: "this feature _can_ cause link drops".
> For example I was indeed able to have EEE relates issue between this
> switch and a link partner with AR8035 PHY. Following patch addressing
> this issue:
> https://lore.kernel.org/all/20230327142202.3754446-8-o.rempel@xxxxxxxxxxxxxx/
> So, in this case KSZ9477 was not the bad side.
>

The errata: http://ww1.microchip.com/downloads/jp/DeviceDoc/jp599888.pdf

Module 4, "End user implications":
--------8<----------
If the link partner is not known, or if the link partner is EEE
capable, then the EEE feature should be manually disabled to avoid link
drop problems.
-------->8----------

> Since this erratum do not describe exact cause of this issue

IMHO, it does - "The EEE feature is enabled by default, but it is not
fully operational. "

It looks like some silicon issue - which in details is probably only
known to Micrel/Microchip.

> or
> specific link partners where this functionality is not working, I
> would prefer to give the user the freedom of choice.

The problem is that - the user - would encounter broken network when
connected to per advertising EEE.

Hence, I would prefer to apply the Errata and then somebody, who would
like to enable EEE can try if it works for him.

IMHO, code to fix erratas shall be added unconditionally, without any
"freedom of choice".

>
> The same issue we have with Pause Frame support. It is not always a
> good choice, but user has freedom to configure it.
>
> Today I wont to create a test setup with different EEE capable link
> partners on one side and KSZ9477 on other side and let it run some
> days. Just to make sure.
>
> Beside, are you able to reproduce this issue?
>

Yes, I can reproduce the issue. I do use two Microchip's development
boards (KSZ9477-EVB [1]) connected together to test HSR as well as
communication with HOST PC.

The network on this board without this patch is not usable (continually
I do encounter link up/downs).

Another test scenario is to connect this board to non-managed ETH
switch (which shall have the EEE advertised by default).


Please be also aware, that this errata fix is (implicitly I think)
already present in the kernel:
https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/micrel.c#L1804

However, the execution order of PHY/DSA functions with newest mainline
makes it not working any more (I've described it in details in the
earlier mail to Vladimir).

> Regards,
> Oleksij

Links:
[1] - https://www.microchip.com/en-us/development-tool/evb-ksz9477-1


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@xxxxxxx

Attachment: pgpnbt6q1HKLN.pgp
Description: OpenPGP digital signature