Re: [PATCH] ath9k: fix calibration data endianness

From: Álvaro Fernández Rojas
Date: Mon Apr 17 2023 - 01:33:52 EST


Hi Toke,

El dom, 16 abr 2023 a las 23:49, Toke Høiland-Jørgensen
(<toke@xxxxxxx>) escribió:
>
> Christian Lamparter <chunkeey@xxxxxxxxx> writes:
>
> > On 4/16/23 12:50, Toke Høiland-Jørgensen wrote:
> >> Christian Lamparter <chunkeey@xxxxxxxxx> writes:
> >>
> >>> On 4/15/23 18:02, Christian Lamparter wrote:
> >>>> Hi,
> >>>>
> >>>> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote:
> >>>>> Álvaro Fernández Rojas <noltari@xxxxxxxxx> writes:
> >>>>>
> >>>>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD
> >>>>>> partitions but it needs to be swapped in order to work, otherwise it fails:
> >>>>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002)
> >>>>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes.
> >>>>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0
> >>>>>> ath: phy0: Unable to initialize hardware; initialization status: -22
> >>>>>> ath9k 0000:00:01.0: Failed to initialize device
> >>>>>> ath9k: probe of 0000:00:01.0 failed with error -22
> >>>>>
> >>>>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set
> >>>>> in the first place? Christian, care to comment on this?
> >>>>
> >>>> I knew this would come up. I've written what I know and remember in the
> >>>> pull-request/buglink.
> >>>>
> >>>> Maybe this can be added to the commit?
> >>>> Link: https://github.com/openwrt/openwrt/pull/12365
> >>>>
> >>>> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom.
> >>>>
> >>>> Since the existing request_firmware eeprom fetcher code set the flag,
> >>>> the nvmem code had to do it too.
> >>>>
> >>>> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc.
> >>>> I don't know if there are devices out there, which have a swapped magic (which is
> >>>> used to detect the endianess), but the caldata is in the correct endiannes (or
> >>>> vice versa - Magic is correct, but data needs swapping).
> >>>>
> >>>> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220)
> >>>> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine.
> >>>> So I don't expect there to be a new issue there).
> >>>
> >>> Nope! This is a classic self-own!... Well at least, this now gets documented!
> >>>
> >>> Here are my findings. Please excuse the overlong lines.
> >>>
> >>> ## The good news / AVM FritzBox 7360v2 ##
> >>>
> >>> The good news: The AVM FritzBox 7360v2 worked the same as before.
> >>
> >> [...]
> >>
> >>> ## The not so good news / Netgear WNDR3700v2 ##
> >>>
> >>> But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 -
> >>> doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with:
> >>> "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0"
> >>
> >> [...]
> >>
> >> Alright, so IIUC, we have a situation where some devices only work
> >> *with* the flag, and some devices only work *without* the flag? So we'll
> >> need some kind of platform-specific setting? Could we put this in the
> >> device trees, or is there a better solution?
> >
> > Depends. From what I gather, ath9k calls this "need_swap". Thing is,
> > the flag in the EEPROM is called "AR5416_EEPMISC_BIG_ENDIAN". In the
> > official documentation about the AR9170 Base EEPROM (has the same base
> > structure as AR5008 up to AR92xx) this is specified as:
> >
> > "Only bit 0 is defined as Big Endian. This bit should be written as 1
> > when the structure is interpreted in big Endian byte ordering. This bit
> > must be reviewed before any larger than byte parameters can be interpreted."
> >
> > It makes sense that on a Big-Endian MIPS device (like the Netgear WNDR3700v2),
> > the caldata should be in "Big-Endian" too... so no swapping is necessary.
> >
> > Looking in ath9k's eeprom.c function ath9k_hw_nvram_swap_data() that deals
> > with this eepmisc flag:
> >
> > | if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) {
> > | *swap_needed = true;
> > | ath_dbg(common, EEPROM,
> > | "Big Endian EEPROM detected according to EEPMISC register.\n");
> > | } else {
> > | *swap_needed = false;
> > | }
> >
> > This doesn't take into consideration that swapping is not needed if
> > the data is in big endian format on a big endian device. So, this
> > could be changed so that the *swap_needed is only true if the flag and
> > device endiannes disagrees?
> >
> > That said, Martin and Felix have written their reasons in the cover letter
> > and patches for why the code is what it is:
> > <https://ath9k-devel.ath9k.narkive.com/2q5A6nu0/patch-0-5-ath9k-eeprom-swapping-improvements>
> >
> > Toke, What's your take on this? Having something similar like the
> > check_endian bool... but for OF? Or more logic that can somehow
> > figure out if it's big or little endian.
>
> Digging into that old thread, it seems we are re-hashing a lot of the
> old discussion when those patches went in. Basically, the code you
> quoted above is correct because the commit that introduced it sets all
> fields to be __le16 and __le32 types and reads them using the
> leXX_to_cpu() macros.
>
> The code *further up* in that function is what is enabled by Alvaro's
> patch. Which is a different type of swapping (where the whole eeprom is
> swab16()'ed, not just the actual multi-byte data fields in them).
> However, in OpenWrt the in-driver code to do this is not used; instead,
> a hotplug script applies the swapping before the device is seen by the
> driver, as described in this commit[0]. Martin indeed mentions that this
> is a device-specific thing, so the driver can't actually do the right
> thing without some outside feature flag[1]. The commit[0] also indicates
> that there used used to exist a device-tree binding in the out-of-tree
> device trees used in OpenWrt to do the unconditional swab16().
>
> The code in [0] still exists in OpenWrt today, albeit in a somewhat
> modified form[2]. I guess the question then boils down to, Álvaro, can
> your issue be resolved by a pre-processing step similar to that which is
> done in [2]? Or do we need the device tree flag after all?

TBH, yes, it can be solved by a pre-processing step similar to what's
done in [2], but then having added nvmem support would make no sense
at all for those devices that need swapping, since it's unusable
without the flag.
So, in my opinion the flag should be added in order to be able to use
it without pre-processing the calibration data and to take advantage
of nvmem support.
I will send the v2 patch and even if it's not accepted I think I will
add it as a downstream patch on OpenWrt...

>
> -Toke
>
> [0] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=afa37092663d00aa0abf8c61943d9a1b5558b144
> [1] https://narkive.com/2q5A6nu0.34
> [2] https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/lantiq/xway/base-files/etc/hotplug.d/firmware/12-ath9k-eeprom;h=98bb9af6947a298775ff7fa26ac6501c57df8378;hb=HEAD

Best regards,
Álvaro.