Re: [PATCH net v3] net: phy: intel-xway: enable integrated led functions

From: Andrew Lunn
Date: Fri Feb 11 2022 - 14:18:12 EST


On Thu, Feb 10, 2022 at 07:52:49AM -0800, Tim Harvey wrote:
> On Wed, Feb 9, 2022 at 4:04 PM Andrew Lunn <andrew@xxxxxxx> wrote:
> >
> > > The errata can be summarized as:
> > > - 1 out of 100 boots or cable plug events RGMII GbE link will end up
> > > going down and up 3 to 4 times then resort to a 100m link; workaround
> > > has been found to require a pin level reset
> >
> > So that sounds like it is downshifting because it thinks there is a
> > broken pair. Can you disable downshift? Problem is, that might just
> > result in link down.
>
> Its a bad situation. The actual errata is that the device latches into
> a bad state where there is some noise on an ADC or something like that
> that cause a high packet error rate. The firmware baked into the PHY
> has a detection mechanism looking at these errors (SSD errors) and if
> there are enough of them it takes the link down and up again and if
> that doesn't resolve in 3 times it shifts down to 100mbs. They call
> this 'ADS' or 'auto-down-speed' and you can disable it but it would
> just result in leaving your bad gbe link up. It's unclear yet if it's
> better to just detect the ADS event and reset or to disable ADS and
> look for the SSD errors myself (which I can do).

I don't think it matters too much which way you detect there is a
problem. But ideally you need a recovery which does not need a
hardware reset. Than you don't need to worry about the other PHY
sharing the reset line. But you know that...

> I agree that I can't do anything in boot firmware. I was planning on
> having some static code that registered a PHY fixup to get a call when
> these PHYs were detected and I could then kick off a polling thread to
> watch for errors and trigger a reset. The reset could have knowledge
> of the PHY devices that called the fixup handler so that I can at
> least setup each PHY again.

That sounds like a reasonable architecture. Your thread would need to
do:

phy_stop()
phy_init_hw()
phy_start()

and phylib probably will do the reset.

Maybe you can put the problem detection code in the .read_status
callback, which sets am 'im_fubar' flag in the drivers private
structure. That gives some building blocks for other users of this PHY
who don't have a shared reset line, and can maybe solve the problem
within the driver.

Andrew