Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

From: Russell King - ARM Linux admin
Date: Wed Dec 30 2020 - 14:14:08 EST


On Wed, Dec 30, 2020 at 06:43:07PM +0100, Pali Rohár wrote:
> On Wednesday 30 December 2020 18:13:15 Andrew Lunn wrote:
> > Hi Pali
> >
> > I have to agree with Russell here. I would rather have no diagnostics
> > than untrustable diagnostics.
>
> Ok!
>
> So should we completely skip hwmon_device_register_with_info() call
> if (i2c_block_size < 2) ?

I don't think that alone is sufficient - there's also the matter of
ethtool -m which will dump that information as well, and we don't want
to offer it to userspace in an unreliable form.

For reference, here is what SFF-8472 which defines the diagnostics, says
about this:

To guarantee coherency of the diagnostic monitoring data, the host is
required to retrieve any multi-byte fields from the diagnostic
monitoring data structure (IE: Rx Power MSB - byte 104 in A2h, Rx Power
LSB - byte 105 in A2h) by the use of a single two-byte read sequence
across the two-wire interface interface.

The transceiver is required to ensure that any multi-byte fields which
are updated with diagnostic monitoring data (e.g. Rx Power MSB - byte
104 in A2h, Rx Power LSB - byte 105 in A2h) must have this update done
in a fashion which guarantees coherency and consistency of the data. In
other words, the update of a multi-byte field by the transceiver must
not occur such that a partially updated multi-byte field can be
transferred to the host. Also, the transceiver shall not update a
multi-byte field within the structure during the transfer of that
multi-byte field to the host, such that partially updated data would be
transferred to the host.

The first paragraph is extremely definitive in how these fields shall
be read atomically - by a _single_ two-byte read sequence. From what
you are telling us, these modules do not support that. Therefore, by
definition, they do *not* support proper and reliable reporting of
diagnostic data, and are non-conformant with the SFP MSAs.

So, they are basically broken, and the diagnostics can't be used to
retrieve data that can be said to be useful.

> I do not think that manufacture says something. I think that they even
> do not know that their Realtek chips are completely broken.

Oh, they do know. I had a response from CarlitoxxPro passed to me, which
was:

That is a behavior related on how your router/switch try to read the
EEPROM, as described in the datasheet of the GPON ONU SFP, the EEPROM
can be read in Sequential Single-Byte mode, not in Multi-byte mode as
you router do, basically, your router is trying to read the full a0h
table in a single call, and retrieve a null response. that is normal
and not affect the operations of the GPON ONU SFP, because these
values are informational only. so the Software for your router should
be able to read in Single-Byte mode to read the content of the EEPROM
in concordance to SFF-8431

which totally misses the point that it is /not/ up to the module to
choose whether multi-byte reads are supported or not. If they bothered
to gain a proper understanding of the MSAs, they would have noticed that
the device on 0xA0 is required to behave as an Atmel AT24Cxx EEPROM.
The following from INF-8074i, which is the very first definition of the
SFP form factor modules:

The SFP serial ID provides access to sophisticated identification
information that describes the transceiver's capabilities, standard
interfaces, manufacturer, and other information. The serial interface
uses the 2-wire serial CMOS E2PROM protocol defined for the ATMEL
AT24C01A/02/04 family of components.

As they took less than one working day to provide the above response, I
suspect they know full well that there's a problem - and it likely
affects other "routers" as well.

They're also confused about their SFF specifications. SFF-8431 is: "SFP+
10 Gb/s and Low Speed Electrical Interface" which is not the correct
specification for a 1Gbps module.

> I can imagine that vendor just says: it is working in our branded boxes
> with SFP cages and if it does not work in your kernel then problem is
> with your custom kernel and we do not care about 3rd parties.

Which shows why it's pointless producing an EEPROM validation tool that
runs under Linux (as has been your suggestion). They won't use it,
since their testing only goes as far as "does it work in our product?"

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!