Re: [PATCH net] net: phy: Warn if phy is attached when removing

From: Russell King (Oracle)
Date: Fri Aug 19 2022 - 20:21:07 EST


On Fri, Aug 19, 2022 at 04:45:19PM -0700, Jakub Kicinski wrote:
> On Tue, 16 Aug 2022 12:37:01 -0400 Sean Anderson wrote:
> > netdevs using phylib can be oopsed from userspace in the following
> > manner:
> >
> > $ ip link set $iface up
> > $ echo $(basename $(readlink /sys/class/net/$iface/phydev)) > \
> > /sys/class/net/$iface/phydev/driver/unbind
> > $ ip link set $iface down
> >
> > However, the traceback provided is a bit too late, since it does not
> > capture the root of the problem (unbinding the driver). It's also
> > possible that the memory has been reallocated if sufficient time passes
> > between when the phy is detached and when the netdev touches the phy
> > (which could result in silent memory corruption). Add a warning at the
> > source of the problem. A future patch could make this more robust by
> > calling dev_close.
>
> FWIW, I think DaveM marked this patch as changes requested.
>
> I don't really know enough to have an opinion.
>
> PHY maintainers, anyone willing to cast a vote?

I don't think Linus is a fan of using WARN_ON() as an assert()
replacement, which this feels very much like that kind of thing.
I don't see much point in using WARN_ON() here as we'll soon get
a kernel oops anyway, and the backtrace WARN_ON() will produce isn't
useful - it'll be just noise.

So, I'd tone it down to something like:

if (phydev->attached_dev)
phydev_err(phydev, "Removing in-use PHY, expect an oops");

Maybe even introduce phydev_crit() just for this message.

Since we have bazillions of network drivers that hold a reference to
the phydev, I don't think we can do much better than this for phylib.
It would be a massive effort to go through all the network drivers
and try to work out how to fix them.


Addressing the PCS part of the patch posting and unrelated to what we
do for phylib...

However, I don't see "we'll do this for phylib, so we can do it for
PCS as well" as much of a sane argument - we don't have bazillions
of network drivers to fix to make this work for PCS. We currently
have no removable PCS (they don't appear with a driver so aren't
even bound to anything.) So we should add the appropriate handling
when we start to do this.

Phylink has the capability to take the link down when something goes
away, and if the PCS goes away, that's exactly what should happen,
rather than oopsing the kernel.

As MAC drivers hold a reference to the PCS instances, as they need to
select the appropriate one, how do MAC drivers get to know that the
PCS has gone away to drop their reference - and tell phylink that the
PCS has gone. That's the problem that needs solving to allow PCS to
be unbound if we're going to make them first class citizens of the
driver model.

I am no fan of "but XYZ doesn't care about it, so why should we care"
arguments. If I remember correctly, phylib pre-dates the device model,
and had the device model retrofitted, so it was a best-efforts
attempt - and suffered back then with the same problem of needing
lots of drivers to be changed in non-trivial ways.

We have the chance here to come up with something better - and I think
that chance should be used to full effect.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!