Re: [PATCH net-next] devlink: Require devlink lock during device reload

From: Jiri Pirko
Date: Tue Nov 09 2021 - 11:29:34 EST


Tue, Nov 09, 2021 at 03:12:33PM CET, leon@xxxxxxxxxx wrote:
>On Mon, Nov 08, 2021 at 03:31:26PM -0800, Jakub Kicinski wrote:
>> On Mon, 8 Nov 2021 21:58:36 +0200 Leon Romanovsky wrote:
>> > > > > nfp will benefit from the simplified locking as well, and so will bnxt,
>> > > > > although I'm not sure the maintainers will opt for using devlink framework
>> > > > > due to the downstream requirements.
>> > > >
>> > > > Exactly why devlink should be fixed first.
>> > >
>> > > If by "fixed first" you mean it needs 5 locks to be added and to remove
>> > > any guarantees on sub-object lifetime then no thanks.
>> >
>> > How do you plan to fix pernet_ops_rwsem lock? By exposing devlink state
>> > to the drivers? By providing unlocked version of unregister_netdevice_notifier?
>> >
>> > This simple scenario has deadlocks:
>> > sudo ip netns add n1
>> > sudo devlink dev reload pci/0000:00:09.0 netns n1
>> > sudo ip netns del n1
>>
>> Okay - I'm not sure why you're asking me this. This is not related to
>> devlink locking as far as I can tell. Neither are you fixing this
>> problem in your own RFC.
>
>I asked you because you clearly showed to me that things that makes
>sense for me, doesn't make sense for you and vice versa.
>
>I don't want to do work that will be thrown away.
>
>>
>> You'd need to tell me more about what the notifier is used for (I see
>> RoCE in the call trace). I don't understand why you need to re-register
>> a global (i.e. not per netns) notifier when devlink is switching name
>> spaces.
>
>RDMA subsystem supports two net namespace aware scenarios.
>
>We need global netdev_notifier for shared mode. This is legacy mode where
>we listen to all namespaces. We must support this mode otherwise we break
>whole RDMA world.
>
>See commit below:
>de641d74fb00 ("Revert "RDMA/mlx5: Fix devlink deadlock on net namespace deletion"")

If it is not possible for whatever reason to have per-ns notifier, you
have to register the global notifier probably only once in init, and
have probably some sort of mechanism to ignore the events while you are
in the middle of the re-init. I don't see other way.