Re: [PATCH net-next 0/5] Move devlink_register to be near devlink_reload_enable

From: Leon Romanovsky
Date: Thu Aug 12 2021 - 00:10:23 EST


On Wed, Aug 11, 2021 at 07:18:17AM -0700, Jakub Kicinski wrote:
> On Wed, 11 Aug 2021 17:01:20 +0300 Leon Romanovsky wrote:
> > > > Not really, they will register but won't be accessible from the user space.
> > > > The only difference is the location of "[dev,new] ..." notification.
> > >
> > > Is that because of mlx5's use of auxdev, or locking? I don't see
> > > anything that should prevent the port notification from coming out.
> >
> > And it is ok, kernel can (and does) send notifications, because we left
> > devlink_ops assignment to be in devlink_alloc(). It ensures that all
> > flows that worked before will continue to work without too much changes.
> >
> > > I think the notifications need to get straightened out, we can't notify
> > > about sub-objects until the object is registered, since they are
> > > inaccessible.
> >
> > I'm not sure about that. You present the case where kernel and user
> > space races against each other and historically kernel doesn't protect
> > from such flows.
> >
> > For example, you can randomly remove and add kernel modules. At some
> > point of time, you will get "missing symbols errors", just because
> > one module tries to load and it depends on already removed one.
>
> Sure. But there is a difference between an error because another
> actor did something conflicting, asynchronously, and API which by design
> sends notifications which can't be acted upon until later point in time,
> because kernel sent them too early.
>
> > We must protect kernel and this is what I do. User shouldn't access
> > devlink instance before he sees "dev name" notification.
>
> Which is a new rule, and therefore a uAPI change..
>
> > Of course, we can move various iterators to devlink_register(), but it
> > will make code much complex, because we have objects that can be
> > registered at any time (IMHO. trap is one of them) and I will need to
> > implement notification logic that separate objects that were created
> > before devlink_register and after.
>
> I appreciate it's a PITA but it is the downside of a solution where
> registration of co-dependent objects exposed via devlink is reordered
> in the kernel.

I thought about it more and realized what we can make registration
monitor notifications behave as before, we can't do it for unregister
path.

For register, we can buffer all notifications till devlink_register
comes, use it as a marker and release everything that was accumulated
till that point. Everything that will come later will be delivered
immediately.

It will give "dev name ..." print at the beginning as you want.

For unregister, this trick won't work because we don't know if any other
devlink unregister API is used after devlink_unregister. So we can't
delay notifications.

Even if we can, it will be even worse from user perspective, because
in such case devlink_unregister() will close netlink access without
notifying user and he won't understand why ports don't work (as an
example).

Jakub, you are over engineering here and solve non-existing problem.

> Which is a new rule, and therefore a uAPI change..

AFAIR, netlink can be out-of-order, because it is UDP, but it is just
impractical to see it in the real-life. So no, it is not new rule.

Thanks