Re: [PATCH net] net: phy: Warn about incorrect mdio_bus_phy_resume() state

From: Florian Fainelli
Date: Tue Aug 16 2022 - 22:28:40 EST




On 8/16/2022 6:20 AM, Geert Uytterhoeven wrote:
Hi Florian,

On Fri, Aug 12, 2022 at 6:39 PM Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
On 8/12/22 04:19, Marek Szyprowski wrote:
On 02.08.2022 01:34, Florian Fainelli wrote:
Calling mdio_bus_phy_resume() with neither the PHY state machine set to
PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication
that we can produce a race condition looking like this:

CPU0 CPU1
bcmgenet_resume
-> phy_resume
-> phy_init_hw
-> phy_start
-> phy_resume
phy_start_aneg()
mdio_bus_phy_resume
-> phy_resume
-> phy_write(..., BMCR_RESET)
-> usleep() -> phy_read()

with the phy_resume() function triggering a PHY behavior that might have
to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix
brcm_fet_config_init()") for instance) that ultimately leads to an error
reading from the PHY.

Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Florian Fainelli <f.fainelli@xxxxxxxxx>

This patch, as probably intended, triggers a warning during system
suspend/resume cycle in the SMSC911x driver. I've observed it on ARM
Juno R1 board on the kernel compiled from next-202208010:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323
mdio_bus_phy_resume+0x34/0xc8

I am seeing the same on the ape6evm and kzm9g development
boards with smsc911x Ethernet, and on various boards with Renesas
Ethernet (sh_eth or ravb) if Wake-on-LAN is disabled.

Yes this is catching an actual issue in the driver in that the PHY state
machine is still running while the system is trying to suspend. We could
go about fixing it in a different number of ways, though I believe this
one is probably correct enough to work and fix the warning:

--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -1037,6 +1037,8 @@ static int smsc911x_mii_probe(struct net_device *dev)
return ret;
}

+ /* Indicate that the MAC is responsible for managing PHY PM */
+ phydev->mac_managed_pm = true;
phy_attached_info(phydev);

phy_set_max_speed(phydev, SPEED_100);
@@ -2587,6 +2589,8 @@ static int smsc911x_suspend(struct device *dev)
if (netif_running(ndev)) {
netif_stop_queue(ndev);
netif_device_detach(ndev);
+ if (!device_may_wakeup(dev))
+ phy_suspend(dev->phydev);
}

/* enable wake on LAN, energy detection and the external PME
@@ -2628,6 +2632,8 @@ static int smsc911x_resume(struct device *dev)
if (netif_running(ndev)) {
netif_device_attach(ndev);
netif_start_queue(ndev);
+ if (!device_may_wakeup(dev))
+ phy_resume(dev->phydev);
}

return 0;

Thanks for your patch, but unfortunately this does not work on ape6evm
and kzm9g, where the smsc911x device is connected to a power-managed
bus. It looks like the PHY registers are accessed while the device
is already suspended, causing a crash during system suspend:

Does it work better if you replace phy_suspend() with phy_stop() and phy_resume() with phy_start()?
--
Florian