RE: [PATCH net-next 3/6] devlink: Count struct devlink consumers

From: Keller, Jacob E
Date: Wed Aug 18 2021 - 13:50:23 EST




> -----Original Message-----
> From: Leon Romanovsky <leon@xxxxxxxxxx>
> Sent: Wednesday, August 18, 2021 1:12 AM
> To: Keller, Jacob E <jacob.e.keller@xxxxxxxxx>
> Cc: Jakub Kicinski <kuba@xxxxxxxxxx>; David S . Miller <davem@xxxxxxxxxxxxx>;
> Guangbin Huang <huangguangbin2@xxxxxxxxxx>; Jiri Pirko <jiri@xxxxxxxxxx>;
> linux-kernel@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Salil Mehta
> <salil.mehta@xxxxxxxxxx>; Shannon Nelson <snelson@xxxxxxxxxxx>; Yisen
> Zhuang <yisen.zhuang@xxxxxxxxxx>; Yufeng Mo <moyufeng@xxxxxxxxxx>
> Subject: Re: [PATCH net-next 3/6] devlink: Count struct devlink consumers
>
> On Mon, Aug 16, 2021 at 09:32:17PM +0000, Keller, Jacob E wrote:
> >
> >
> > > -----Original Message-----
> > > From: Jakub Kicinski <kuba@xxxxxxxxxx>
> > > Sent: Monday, August 16, 2021 9:07 AM
> > > To: Leon Romanovsky <leon@xxxxxxxxxx>
> > > Cc: David S . Miller <davem@xxxxxxxxxxxxx>; Guangbin Huang
> > > <huangguangbin2@xxxxxxxxxx>; Keller, Jacob E <jacob.e.keller@xxxxxxxxx>;
> Jiri
> > > Pirko <jiri@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
> netdev@xxxxxxxxxxxxxxx;
> > > Salil Mehta <salil.mehta@xxxxxxxxxx>; Shannon Nelson
> > > <snelson@xxxxxxxxxxx>; Yisen Zhuang <yisen.zhuang@xxxxxxxxxx>; Yufeng
> > > Mo <moyufeng@xxxxxxxxxx>
> > > Subject: Re: [PATCH net-next 3/6] devlink: Count struct devlink consumers
> > >
> > > On Mon, 16 Aug 2021 18:53:45 +0300 Leon Romanovsky wrote:
> > > > On Mon, Aug 16, 2021 at 08:47:41AM -0700, Jakub Kicinski wrote:
> > > > > On Sat, 14 Aug 2021 12:57:28 +0300 Leon Romanovsky wrote:
> > > > > > From: Leon Romanovsky <leonro@xxxxxxxxxx>
> > > > > >
> > > > > > The struct devlink itself is protected by internal lock and doesn't
> > > > > > need global lock during operation. That global lock is used to protect
> > > > > > addition/removal new devlink instances from the global list in use by
> > > > > > all devlink consumers in the system.
> > > > > >
> > > > > > The future conversion of linked list to be xarray will allow us to
> > > > > > actually delete that lock, but first we need to count all struct devlink
> > > > > > users.
> > > > >
> > > > > Not a problem with this set but to state the obvious the global devlink
> > > > > lock also protects from concurrent execution of all the ops which don't
> > > > > take the instance lock (DEVLINK_NL_FLAG_NO_LOCK). You most likely
> know
> > > > > this but I thought I'd comment on an off chance it helps.
> > > >
> > > > The end goal will be something like that:
> > > > 1. Delete devlink lock
> > > > 2. Rely on xa_lock() while grabbing devlink instance (past devlink_try_get)
> > > > 3. Convert devlink->lock to be read/write lock to make sure that we can run
> > > > get query in parallel.
> > > > 4. Open devlink netlink to parallel ops, ".parallel_ops = true".
> > >
> > > IIUC that'd mean setting eswitch mode would hold write lock on
> > > the dl instance. What locks does e.g. registering a dl port take
> > > then?
> >
> > Also that I think we have some cases where we want to allow the driver to
> allocate new devlink objects in response to adding a port, but still want to block
> other global operations from running?
>
> I don't see the flow where operations on devlink_A should block devlink_B.
> Only in such flows we will need global lock like we have now - devlink->lock.
> In all other flows, write lock of devlink instance will protect from
> parallel execution.
>
> Thanks


But how do we handle what is essentially recursion?

If we add a port on the devlink A:

userspace sends PORT_ADD for devlink A
driver responds by creating a port
adding a port causes driver to add a region, or other devlink object

In the current design, if I understand correctly, we hold the global lock but *not* the instance lock. We can't hold the instance lock while adding port without breaking a bunch of drivers that add many devlink objects in response to port creation.. because they'll deadlock when going to add the sub objects.

But if we don't hold the global lock, then in theory another userspace program could attempt to do something inbetween PORT_ADD starting and finishing which might not be desirable. (Remember, we had to drop the instance lock otherwise drivers get stuck when trying to add many subobjects)

Thanks,
Jake