Re: [PATCH net-next v4] net: stmmac:fix system hang when setting up tag_8021q VLAN for DSA ports

From: Vladimir Oltean
Date: Thu Apr 13 2023 - 13:41:41 EST


On Thu, Apr 13, 2023 at 10:15:55AM -0700, Florian Fainelli wrote:
> On 4/13/23 10:07, Jacob Keller wrote:
> > On 4/13/2023 8:06 AM, Yan Wang wrote:
> > > The system hang because of dsa_tag_8021q_port_setup()->
> > > stmmac_vlan_rx_add_vid().
> > >
> > > I found in stmmac_drv_probe() that cailing pm_runtime_put()
> > > disabled the clock.
> > >
> > > First, when the kernel is compiled with CONFIG_PM=y,The stmmac's
> > > resume/suspend is active.
> > >
> > > Secondly,stmmac as DSA master,the dsa_tag_8021q_port_setup() function
> > > will callback stmmac_vlan_rx_add_vid when DSA dirver starts. However,
> > > The system is hanged for the stmmac_vlan_rx_add_vid() accesses its
> > > registers after stmmac's clock is closed.
> > >
> > > I would suggest adding the pm_runtime_resume_and_get() to the
> > > stmmac_vlan_rx_add_vid().This guarantees that resuming clock output
> > > while in use.
> > >
> > > Signed-off-by: Yan Wang <rk.code@xxxxxxxxxxx>
> >
> > This looks identical to the net fix you posted at [1]. I don't think we
> > need both?
> >
> > [1]:
> > https://lore.kernel.org/netdev/KL1PR01MB5448020DE191340AE64530B0E6989@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>
> Unfortunately both still lack a proper Fixes: tag, and this is bug fix.
> --
> Florian
>

I guess that would be:

Fixes: 5ec55823438e ("net: stmmac: add clocks management for gmac driver")

although in this case, that would be only part of the story. That commit
split the runtime PM handling between stmmac_vlan_rx_add_vid() and
stmmac_vlan_rx_kill_vid() in a strange way, where an added VLAN RX
filter takes a refcount on the device, and a deleted filter one drops
the refcount.

That is... strange?! but it worked in a way, I guess.

Then commit b3dcb3127786 ("net: stmmac: correct clocks enabled in
stmmac_vlan_rx_kill_vid()") came a few months later and blamed that
oddity on a bad merge conflict resolution... ?! Basically, from what I
can tell, it's this later commit the one that broke things, for using
runtime PM only for stmmac_vlan_rx_kill_vid() but not for stmmac_vlan_rx_add_vid().