Re: [PATCH net-next 0/2] net: dsa: realtek: fix PHY register read corruption

From: Alvin Šipraga
Date: Thu Feb 17 2022 - 08:09:14 EST


Andrew Lunn <andrew@xxxxxxx> writes:

>> Thank you Andrew for the clear explanation.
>>
>> Somewhat unrelated to this series, but are you able to explain to me the
>> difference between:
>>
>> mutex_lock(&bus->mdio_lock);
>> and
>> mutex_lock_nested(&bus->mdio_lock, MDIO_MUTEX_NESTED);
>>
>> While looking at other driver examples I noticed the latter form quite a
>> few times too.
>
> This is to do with the debug code for checking for deadlocks,
> CONFIG_PROVE_LOCKING. When that feature is enables, each lock/unlock
> of a mutex is tracked, and a list is made of what other locks are also
> taken, and the order. The code can find deadlocks where one thread
> takes A then B, while another thread takes B and then A. It can also
> detect when a thread takes lock A and then tries to take lock A again.
>
> Rather than track each individual mutex, it uses classes of mutex. So
> bus->mdio_lock is a class of mutex. The code simply tracks that a
> bus->mdio_lock has been taken, not a specific bus->mdio_lock. That is
> generally sufficient, but not always. The mv88e6xxx switch is like
> many switches, accessed over MDIO. But the mv88e6xxx switch offers an
> MDIO bus, and there is an MDIO bus driver inside the mv88e6xxx
> driver. So you have nested MDIO calls. So this debug code seems the
> same class of mutex being taken twice, and thinks it is a
> deadlock. You can tell it that nested MDIO calls are actually O.K, it
> won't deadlock.

Thanks for the explanation, the missing piece of the puzzle was the fact
that some switch drivers expose an additional MDIO bus. I can understand
the CONFIG_PROVE_LOCKING rationale.

If you have the patience to answer a few more questions:

1. You mentioned in an earlier mail that the mdio_lock is used mostly by
PHY drivers to synchronize their access to the MDIO bus, for a single
read or write. You also mentioned that for switches which have a more
involved access pattern (for instance to access switch management
registers), a higher lock is required. In realtek-mdio this is the case:
we do a couple of reads and writes over the MDIO bus to access the
switch registers. Moreover, the mdio_lock is held for the duration of
these MDIO bus reads/writes. Do you mean to say that one should rather
take a higher-level lock and only lock/unlock the mdio_lock on a
per-read or per-write basis? Put another way, should this:

static int realtek_mdio_read(void *ctx, u32 reg, u32 *val)
{
/* ... */

mutex_lock(&bus->mdio_lock);

bus->write(bus, priv->mdio_addr, ...);
bus->write(bus, priv->mdio_addr, ...);
bus->write(bus, priv->mdio_addr, ...);
bus->read(bus, priv->mdio_addr, ...);

/* ... */

mutex_unlock(&bus->mdio_lock);

return ret;
}

rather look like this?:

static int realtek_mdio_read(void *ctx, u32 reg, u32 *val)
{
/* ... */

mutex_lock(&my_realtek_driver_lock); /* synchronize concurrent realtek_mdio_{read,write} */

mdiobus_write(bus, priv->mdio_addr, ...); /* mdio_lock locked/unlocked here */
mdiobus_write(bus, priv->mdio_addr, ...); /* ditto */
mdiobus_write(bus, priv->mdio_addr, ...); /* ditto */
mdiobus_read(bus, priv->mdio_addr, ...); /* ditto */

/* ... */

mutex_unlock(&my_realtek_driver_lock);

return ret;
}


2. Is the nested locking only relevant for DSA switches which offer
another MDIO bus? Or should all switch drivers do this, on the basis
that, feasibly, one could connect my Realtek switch to the MDIO bus of a
mv88e6xxx switch? In that case, and assuming the latter form of
raeltek_mdio_read above, should one use the mdiobus_{read,write}_nested
functions instead?

Kind regards,
Alvin