Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure

From: Leon Romanovsky
Date: Thu Nov 10 2022 - 04:17:22 EST


On Wed, Nov 09, 2022 at 08:11:55PM +0000, Keller, Jacob E wrote:
>
>
> > -----Original Message-----
> > From: Leon Romanovsky <leon@xxxxxxxxxx>
> > Sent: Wednesday, November 9, 2022 10:21 AM
> > To: ivecera <ivecera@xxxxxxxxxx>
> > Cc: netdev@xxxxxxxxxxxxxxx; sassmann@xxxxxxxxxx; Keller, Jacob E
> > <jacob.e.keller@xxxxxxxxx>; Piotrowski, Patryk <patryk.piotrowski@xxxxxxxxx>;
> > SlawomirX Laba <slawomirx.laba@xxxxxxxxx>; Brandeburg, Jesse
> > <jesse.brandeburg@xxxxxxxxx>; Nguyen, Anthony L
> > <anthony.l.nguyen@xxxxxxxxx>; David S. Miller <davem@xxxxxxxxxxxxx>; Eric
> > Dumazet <edumazet@xxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo
> > Abeni <pabeni@xxxxxxxxxx>; moderated list:INTEL ETHERNET DRIVERS <intel-
> > wired-lan@xxxxxxxxxxxxxxxx>; open list <linux-kernel@xxxxxxxxxxxxxxx>
> > Subject: Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
> >
> > On Tue, Nov 08, 2022 at 11:25:02AM +0100, Ivan Vecera wrote:
> > > After commit aa626da947e9 ("iavf: Detach device during reset task")
> > > the device is detached during reset task and re-attached at its end.
> > > The problem occurs when reset task fails because Tx queues are
> > > restarted during device re-attach and this leads later to a crash.
> >
> > <...>
> >
> > > + if (netif_running(netdev)) {
> > > + /* Close device to ensure that Tx queues will not be started
> > > + * during netif_device_attach() at the end of the reset task.
> > > + */
> > > + rtnl_lock();
> > > + dev_close(netdev);
> > > + rtnl_unlock();
> > > + }
> >
> > Sorry for my naive question, I see this pattern a lot (including RDMA),
> > so curious. Everyone checks netif_running() outside of rtnl_lock, while
> > dev_close() changes state bit __LINK_STATE_START. Shouldn't rtnl_lock()
> > placed before netif_running()?
>
> Yes I think you're right. A ton of people check it without the lock but I think thats not strictly safe. Is dev_close safe to call when netif_running is false? Why not just remove the check and always call dev_close then.

I honestly don't know.

To remove any doubts, this patch is LGTM.

Thanks,
Reviewed-by: Leon Romanovsky <leonro@xxxxxxxxxx>