Re: [PATCH net] net: phy: Warn about incorrect mdio_bus_phy_resume() state

From: Florian Fainelli
Date: Fri Aug 12 2022 - 12:32:26 EST


On 8/12/22 04:19, Marek Szyprowski wrote:
Hi All,

On 02.08.2022 01:34, Florian Fainelli wrote:
Calling mdio_bus_phy_resume() with neither the PHY state machine set to
PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication
that we can produce a race condition looking like this:

CPU0 CPU1
bcmgenet_resume
-> phy_resume
-> phy_init_hw
-> phy_start
-> phy_resume
phy_start_aneg()
mdio_bus_phy_resume
-> phy_resume
-> phy_write(..., BMCR_RESET)
-> usleep() -> phy_read()

with the phy_resume() function triggering a PHY behavior that might have
to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix
brcm_fet_config_init()") for instance) that ultimately leads to an error
reading from the PHY.

Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
Signed-off-by: Florian Fainelli <f.fainelli@xxxxxxxxx>

This patch, as probably intended, triggers a warning during system
suspend/resume cycle in the SMSC911x driver. I've observed it on ARM
Juno R1 board on the kernel compiled from next-202208010:

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323
mdio_bus_phy_resume+0x34/0xc8
 Modules linked in: smsc911x cpufreq_powersave cpufreq_conservative
crct10dif_ce ip_tables x_tables ipv6 [last unloaded: smsc911x]
 CPU: 1 PID: 398 Comm: rtcwake Not tainted 5.19.0+ #940
 Hardware name: ARM Juno development board (r1) (DT)
 pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : mdio_bus_phy_resume+0x34/0xc8
 lr : dpm_run_callback+0x74/0x350
 ...
 Call trace:
  mdio_bus_phy_resume+0x34/0xc8
  dpm_run_callback+0x74/0x350
  device_resume+0xb8/0x258
  dpm_resume+0x120/0x4a8
  dpm_resume_end+0x14/0x28
  suspend_devices_and_enter+0x164/0xa60
  pm_suspend+0x25c/0x3a8
  state_store+0x84/0x108
  kobj_attr_store+0x14/0x28
  sysfs_kf_write+0x60/0x70
  kernfs_fop_write_iter+0x124/0x1a8
  new_sync_write+0xd0/0x190
  vfs_write+0x208/0x478
  ksys_write+0x64/0xf0
  __arm64_sys_write+0x14/0x20
  invoke_syscall+0x40/0xf8
  el0_svc_common.constprop.3+0x8c/0x120
  do_el0_svc+0x28/0xc8
  el0_svc+0x48/0xd0
  el0t_64_sync_handler+0x94/0xb8
  el0t_64_sync+0x15c/0x160
 irq event stamp: 24406
 hardirqs last  enabled at (24405): [<ffff8000090c4734>]
_raw_spin_unlock_irqrestore+0x8c/0x90
 hardirqs last disabled at (24406): [<ffff8000090b3164>] el1_dbg+0x24/0x88
 softirqs last  enabled at (24144): [<ffff800008010488>] _stext+0x488/0x5cc
 softirqs last disabled at (24139): [<ffff80000809bf98>]
irq_exit_rcu+0x168/0x1a8
 ---[ end trace 0000000000000000 ]---

I hope the above information will help fixing the driver.

Yes this is catching an actual issue in the driver in that the PHY state machine is still running while the system is trying to suspend. We could go about fixing it in a different number of ways, though I believe this one is probably correct enough to work and fix the warning:

diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c
index 3bf20211cceb..e9c0668a4dc0 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -1037,6 +1037,8 @@ static int smsc911x_mii_probe(struct net_device *dev)
return ret;
}

+ /* Indicate that the MAC is responsible for managing PHY PM */
+ phydev->mac_managed_pm = true;
phy_attached_info(phydev);

phy_set_max_speed(phydev, SPEED_100);
@@ -2587,6 +2589,8 @@ static int smsc911x_suspend(struct device *dev)
if (netif_running(ndev)) {
netif_stop_queue(ndev);
netif_device_detach(ndev);
+ if (!device_may_wakeup(dev))
+ phy_suspend(dev->phydev);
}

/* enable wake on LAN, energy detection and the external PME
@@ -2628,6 +2632,8 @@ static int smsc911x_resume(struct device *dev)
if (netif_running(ndev)) {
netif_device_attach(ndev);
netif_start_queue(ndev);
+ if (!device_may_wakeup(dev))
+ phy_resume(dev->phydev);
}

return 0;

--
Florian