Re: [RFC net] Revert "net: phy: Fix race condition on link status change"

From: Serge Semin
Date: Wed Aug 16 2023 - 16:04:25 EST


On Wed, Aug 16, 2023 at 09:35:57PM +0200, Andrew Lunn wrote:
> On Wed, Aug 16, 2023 at 09:09:40PM +0300, Serge Semin wrote:
> > Protecting the phy_driver.drv->handle_interrupt() callback invocation by
> > the phy_device.lock mutex causes all the IRQ-capable PHY drivers to lock
> > the mutex twice thus deadlocking on the next calls thread:
> > IRQ: phy_interrupt()
> > +-> mutex_lock(&phydev->lock); <-------------+
> > drv->handle_interrupt() | Deadlock due to the
> > +-> phy_error() + nested PHY-device
> > +-> phy_process_error() | mutex lock
> > +-> mutex_lock(&phydev->lock); <-+
> > phydev->state = PHY_ERROR;
> > mutex_unlock(&phydev->lock);
> > mutex_unlock(&phydev->lock);
> >
> > The problem can be easily reproduced just by calling phy_error() from the
> > any PHY-device interrupt handler.
>
> https://elixir.bootlin.com/linux/v6.5-rc6/source/drivers/net/phy/phy.c#L1201
>
> /**
> * phy_error - enter ERROR state for this PHY device
> * @phydev: target phy_device struct
> *
> * Moves the PHY to the ERROR state in response to a read
> * or write error, and tells the controller the link is down.
> * Must not be called from interrupt context, or while the
> * phydev->lock is held.
> */
> void phy_error(struct phy_device *phydev)
> {
> WARN_ON(1);
> phy_process_error(phydev);
> }
> EXPORT_SYMBOL(phy_error);
>
> It is clearly documented you should not do this.
>
> [Goes and looks]
>
> Ah, there are lots of examples of
>
> micrel.c- irq_status = phy_read(phydev, LAN8814_INTS);
> micrel.c- if (irq_status < 0) {
> micrel.c: phy_error(phydev);
> micrel.c- return IRQ_NONE;
> micrel.c- }

It's literally the only pattern where the phy_error() method is utilized but
it can be found in all IRQ-capable network PHY drivers. Seeing how
widespread the function usage is I didn't even dared to think that
the function doc could state it can't be utilized like that or when the
lock is held.

>
> I actually think phy_error() is broken here. The general pattern is
> that the mutex is locked before calling into the driver. So we
> actually want phy_error() to be safe to use with the lock already
> taken. The exceptions when the lock is not held is stuff outside of
> PHY operation, like HWMON, and suspend and resume, plus probe.
>
> So i suggest you change phy_process_error() to remove the lock.

This doable.

> Maybe
> add a test to ensure the lock is actually held, and do a phydev_err()
> if not.

This can't be done since phy_state_machine() calls phy_error_precise()
which calls phy_process_error() with no phy_device.lock held. Printing the
error in that case would mean an error in the Networking PHY subsystem
itself.

Do you suggest to take the lock before calling phy_error_precise() then?

>
> The comment about interrupt context is also probably bogus. phylib
> only uses threaded interrupts, and it is safe to block in this
> context.

Ok.

-Serge(y)

>
> Andrew