RE: [PATCH net] iavf: Do not restart Tx queues after reset task failure

From: Keller, Jacob E
Date: Wed Nov 09 2022 - 15:12:05 EST




> -----Original Message-----
> From: Leon Romanovsky <leon@xxxxxxxxxx>
> Sent: Wednesday, November 9, 2022 10:21 AM
> To: ivecera <ivecera@xxxxxxxxxx>
> Cc: netdev@xxxxxxxxxxxxxxx; sassmann@xxxxxxxxxx; Keller, Jacob E
> <jacob.e.keller@xxxxxxxxx>; Piotrowski, Patryk <patryk.piotrowski@xxxxxxxxx>;
> SlawomirX Laba <slawomirx.laba@xxxxxxxxx>; Brandeburg, Jesse
> <jesse.brandeburg@xxxxxxxxx>; Nguyen, Anthony L
> <anthony.l.nguyen@xxxxxxxxx>; David S. Miller <davem@xxxxxxxxxxxxx>; Eric
> Dumazet <edumazet@xxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo
> Abeni <pabeni@xxxxxxxxxx>; moderated list:INTEL ETHERNET DRIVERS <intel-
> wired-lan@xxxxxxxxxxxxxxxx>; open list <linux-kernel@xxxxxxxxxxxxxxx>
> Subject: Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
>
> On Tue, Nov 08, 2022 at 11:25:02AM +0100, Ivan Vecera wrote:
> > After commit aa626da947e9 ("iavf: Detach device during reset task")
> > the device is detached during reset task and re-attached at its end.
> > The problem occurs when reset task fails because Tx queues are
> > restarted during device re-attach and this leads later to a crash.
>
> <...>
>
> > + if (netif_running(netdev)) {
> > + /* Close device to ensure that Tx queues will not be started
> > + * during netif_device_attach() at the end of the reset task.
> > + */
> > + rtnl_lock();
> > + dev_close(netdev);
> > + rtnl_unlock();
> > + }
>
> Sorry for my naive question, I see this pattern a lot (including RDMA),
> so curious. Everyone checks netif_running() outside of rtnl_lock, while
> dev_close() changes state bit __LINK_STATE_START. Shouldn't rtnl_lock()
> placed before netif_running()?

Yes I think you're right. A ton of people check it without the lock but I think thats not strictly safe. Is dev_close safe to call when netif_running is false? Why not just remove the check and always call dev_close then.

Thanks,
Jake

>
> Thanks
>
> > +
> > dev_err(&adapter->pdev->dev, "failed to allocate resources during
> reinit\n");
> > reset_finish:
> > rtnl_lock();
> > --
> > 2.37.4
> >