Re: [RFC PATCH net-next 1/5] net: dsa: realtek-smi: fix mdio_free bug on module unload

From: Alvin Šipraga
Date: Sun Aug 22 2021 - 18:42:49 EST


Hi Vladimir,

On 8/22/21 11:54 PM, Vladimir Oltean wrote:
> On Sun, Aug 22, 2021 at 09:31:39PM +0200, Alvin Šipraga wrote:
>> From: Alvin Šipraga <alsi@xxxxxxxxxxxxxxx>
>>
>> realtek-smi-core fails to unregister the slave MII bus on module unload,
>> raising the following BUG warning:
>>
>> mdio_bus.c:650: BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
>>
>> kernel BUG at drivers/net/phy/mdio_bus.c:650!
>> Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
>> Call trace:
>> mdiobus_free+0x4c/0x50
>> devm_mdiobus_free+0x18/0x20
>> release_nodes.isra.0+0x1c0/0x2b0
>> devres_release_all+0x38/0x58
>> device_release_driver_internal+0x124/0x1e8
>> driver_detach+0x54/0xe0
>> bus_remove_driver+0x60/0xd8
>> driver_unregister+0x34/0x60
>> platform_driver_unregister+0x18/0x20
>> realtek_smi_driver_exit+0x14/0x1c [realtek_smi]
>>
>> Fix this by duly unregistering the slave MII bus with
>> mdiobus_unregister. We do this in the DSA teardown path, since
>> registration is performed in the DSA setup path.
>>
>> Cc: Linus Walleij <linus.walleij@xxxxxxxxxx>
>> Fixes: d8652956cf37 ("net: dsa: realtek-smi: Add Realtek SMI driver")
>> Signed-off-by: Alvin Šipraga <alsi@xxxxxxxxxxxxxxx>
>> ---
>> drivers/net/dsa/realtek-smi-core.c | 6 ++++++
>> drivers/net/dsa/realtek-smi-core.h | 1 +
>> drivers/net/dsa/rtl8366rb.c | 8 ++++++++
>> 3 files changed, 15 insertions(+)
>>
>> diff --git a/drivers/net/dsa/realtek-smi-core.c b/drivers/net/dsa/realtek-smi-core.c
>> index 8e49d4f85d48..6992b6b31db6 100644
>> --- a/drivers/net/dsa/realtek-smi-core.c
>> +++ b/drivers/net/dsa/realtek-smi-core.c
>> @@ -383,6 +383,12 @@ int realtek_smi_setup_mdio(struct realtek_smi *smi)
>> return ret;
>> }
>>
>> +void realtek_smi_teardown_mdio(struct realtek_smi *smi)
>> +{
>> + if (smi->slave_mii_bus)
>> + mdiobus_unregister(smi->slave_mii_bus);
>> +}
>> +
>> static int realtek_smi_probe(struct platform_device *pdev)
>> {
>> const struct realtek_smi_variant *var;
>> diff --git a/drivers/net/dsa/realtek-smi-core.h b/drivers/net/dsa/realtek-smi-core.h
>> index fcf465f7f922..6cfa5f2df7ea 100644
>> --- a/drivers/net/dsa/realtek-smi-core.h
>> +++ b/drivers/net/dsa/realtek-smi-core.h
>> @@ -119,6 +119,7 @@ struct realtek_smi_variant {
>> int realtek_smi_write_reg_noack(struct realtek_smi *smi, u32 addr,
>> u32 data);
>> int realtek_smi_setup_mdio(struct realtek_smi *smi);
>> +void realtek_smi_teardown_mdio(struct realtek_smi *smi);
>>
>> /* RTL8366 library helpers */
>> int rtl8366_mc_is_used(struct realtek_smi *smi, int mc_index, int *used);
>> diff --git a/drivers/net/dsa/rtl8366rb.c b/drivers/net/dsa/rtl8366rb.c
>> index a89093bc6c6a..6537fac7aba4 100644
>> --- a/drivers/net/dsa/rtl8366rb.c
>> +++ b/drivers/net/dsa/rtl8366rb.c
>> @@ -982,6 +982,13 @@ static int rtl8366rb_setup(struct dsa_switch *ds)
>> return 0;
>> }
>>
>> +static void rtl8366rb_teardown(struct dsa_switch *ds)
>> +{
>> + struct realtek_smi *smi = ds->priv;
>> +
>> + realtek_smi_teardown_mdio(smi);
>> +}
>> +
>
> Objection: dsa_switch_teardown has:
>
> if (ds->slave_mii_bus && ds->ops->phy_read)
> mdiobus_unregister(ds->slave_mii_bus);

This is unregistering an mdiobus registered in dsa_switch_setup:

if (!ds->slave_mii_bus && ds->ops->phy_read) {
ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
if (!ds->slave_mii_bus) {
err = -ENOMEM;
goto teardown;
}

dsa_slave_mii_bus_init(ds);

err = mdiobus_register(ds->slave_mii_bus);
if (err < 0)
goto teardown;
}

However, we don't enter this codepath because:

- ds->slave_mii_bus is already set in the call to ds->ops->setup()
before the code snippet above;
- ds->ops->phy_read is not set.

We don't want to either, since we want to use of_mdiobus_register().

>
> The realtek_smi_setup_mdio function does:
>
> smi->ds->slave_mii_bus = smi->slave_mii_bus;
>
> so I would expect that this would result in a double unregister on some
> systems.
>
> I haven't went through your new driver, but I wonder whether you have
> the phy_read and phy_write methods implemented? Maybe that is the
> difference?

Right, phy_read/phy_write are not set in the dsa_switch_ops of
rtl8365mb. So we should be safe.

It did get me thinking that it would be nice if dsa_register_switch()
could call of_mdiobus_register() when necessary, since the snippet above
(and its call to dsa_slave_mii_bus_init()) is almost same as
realtek_smi_setup_mdio(). It would simplify some logic in realtek-smi
drivers and obviate the need for this patch. I am not sure what the
right approach to this would be but with some pointers I can give it a shot.

>
>> static enum dsa_tag_protocol rtl8366_get_tag_protocol(struct dsa_switch *ds,
>> int port,
>> enum dsa_tag_protocol mp)
>> @@ -1505,6 +1512,7 @@ static int rtl8366rb_detect(struct realtek_smi *smi)
>> static const struct dsa_switch_ops rtl8366rb_switch_ops = {
>> .get_tag_protocol = rtl8366_get_tag_protocol,
>> .setup = rtl8366rb_setup,
>> + .teardown = rtl8366rb_teardown,
>> .phylink_mac_link_up = rtl8366rb_mac_link_up,
>> .phylink_mac_link_down = rtl8366rb_mac_link_down,
>> .get_strings = rtl8366_get_strings,
>> --
>> 2.32.0
>>