Re: [PATCH] net: phy: adin1100: Fix nullptr exception for phy interrupts

From: Andre Werner
Date: Fri Jan 19 2024 - 01:32:19 EST


On Thu, 18 Jan 2024, Heiner Kallweit wrote:

On 18.01.2024 21:09, Andrew Lunn wrote:
On Thu, Jan 18, 2024 at 06:36:16PM +0100, Heiner Kallweit wrote:
On 18.01.2024 17:53, Andrew Lunn wrote:
On Thu, Jan 18, 2024 at 11:43:41AM +0100, Andre Werner wrote:
If using ADIN1100 as an external phy, e.g. in combination with
"smsc95xx", we ran into nullptr exception by creating a link.

In our case the "smsc95xx" does not check for an available interrupt handler
on external phy driver to use poll instead of interrupts if no handler is
available. So we decide to implement a small handler in the phy driver
to support other MACs as well.

I update the driver to add an interrupt handler because libphy
does not check if their is a interrupt handler available either.

There are several interrupts maskable at the phy, but only link change interrupts
are handled by the driver yet.

We tested the combination "smsc95xx" and "adin1100" with Linux Kernel 6.6.9
and Linux Kernel 6.1.0, respectively.

Hi Andre

A few different things....

Please could you give more details of the null pointer
exception. phylib should test if the needed methods have been
implemented in the PHY driver, and not tried to use interrupts when
they are missing. It should of polled the PHY. So i would like to
understand what went wrong. Maybe we have a phylib core bug we should
be fixing. Or a bug in the smsc95xx driver.

Seems to be a bug in smsc95xx. The following is the relevant code piece.

ret = mdiobus_register(pdata->mdiobus);
if (ret) {
netdev_err(dev->net, "Could not register MDIO bus\n");
goto free_mdio;
}

pdata->phydev = phy_find_first(pdata->mdiobus);
if (!pdata->phydev) {
netdev_err(dev->net, "no PHY found\n");
ret = -ENODEV;
goto unregister_mdio;
}

pdata->phydev->irq = phy_irq;

The interrupt is set too late, after phy_probe(), where we have this:

if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev))
phydev->irq = PHY_POLL;

So we would have two steps:
1. Fix the smsc95xx bug (as the same issue could occur with another PHY type)

Its not so nice to fix.

Normally you would do something like:

pdata->mdiobus->priv = dev;
pdata->mdiobus->read = smsc95xx_mdiobus_read;
pdata->mdiobus->write = smsc95xx_mdiobus_write;
pdata->mdiobus->reset = smsc95xx_mdiobus_reset;
pdata->mdiobus->name = "smsc95xx-mdiobus";
pdata->mdiobus->parent = &dev->udev->dev;

snprintf(pdata->mdiobus->id, ARRAY_SIZE(pdata->mdiobus->id),
"usb-%03d:%03d", dev->udev->bus->busnum, dev->udev->devnum);

pdata->mdiobus->irq[X] = phy_irq;

ret = mdiobus_register(pdata->mdiobus);

By setting pdata->mdiobus->irq[X] before registering the PHY, the irq
number gets passed to the phydev->irq very early on. If everything is
O.K, interrupts are then used.

However, because of the use of phy_find_first(), we have no idea what
address on the bus the PHY is using. So we don't know which member of
irq[] to set. Its not ideal, but one solution is to set them all.

However, a better solution is to perform the validation again in
phy_attach_direct(). Add a second:

if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev))
phydev->irq = PHY_POLL;

I add this check into phy_device.c-> phy_attach_direct as a work around
for now. I will send a new patch set to net-next as suggested.


This would save us here, but can't prevent that phydev->irq may be set
even later. I think, ideally nobody should ever access phydev->irq directly.
There should be a setter which performs the needed checks.
But it may be a longer journey to make parts of struct phy_device private
to phylib.

which will force phydev->irq back to polling.

Andrew

Heiner


Regards,

Andre