Re: [PATCH net-next] devlink: Require devlink lock during device reload

From: Jason Gunthorpe
Date: Tue Nov 09 2021 - 10:33:47 EST


On Tue, Nov 09, 2021 at 07:07:02AM -0800, Jakub Kicinski wrote:
> On Tue, 9 Nov 2021 10:43:58 -0400 Jason Gunthorpe wrote:
> > This becomes all entangled in the aux device stuff we did before.
>
> So entangled in fact that neither of you is willing to elucidate
> the exact need ;)

I haven't been paying attention to this thread, Leon just asked me to
elaborate on why roce is in these traces.

What I know is mlx5 testing has shown an alarming number of crashers
and bugs connected to devlink and Leon has been grinding them
away. mlx5 is quite heavily invested in devlink and mlx5 really needs
it to work robustly.

> The main use case for reload for netns is placing a VF in a namespace,
> for a container to use. Is that right? I've not seen use cases
> requiring the PF to be moved, are there any?

Sure, it can be useful. I wouldn't call it essential, but it is there.

> > I once sketched out fixing this by removing the need to hold the
> > per_net_rwsem just for list iteration, which in turn avoids holding it
> > over the devlink reload paths. It seemed like a reasonable step toward
> > finer grained locking.
>
> Seems to me the locking is just a symptom.

My fear is this reload during net ns destruction is devlink uAPI now
and, yes it may be only a symptom, but the root cause may be unfixable
uAPI constraints. Jiri, what do you think?

Speaking generally, I greatly prefer devlink move toward a design where
the aux bus operations that must be connected with devlink reload are
not invoked under any global locks at all. Not per_net_rwsem, rtnl, or
devlink BKLs.

We know holding global locks in open-ended contexts, like driver core
device_register/unregister, is an dangerous pattern.

Concretely, now that we are pushing virtualization use cases into
using devlink as a control plane mlx5 has pod/VM launch issuing
devlink steps. Some of this stuff is necessarily slow, like attaching
a roce aux driver takes a while. Holding a BKL while doing that means
all VM launches are serialized.

IMHO these needs demand a fine grained locking approach, not a BKL
design.

Jason