Re: [PATCH V3 2/5] misc: mlx5ctl: Add mlx5ctl misc driver

From: Greg Kroah-Hartman
Date: Wed Nov 29 2023 - 04:20:43 EST


On Wed, Nov 29, 2023 at 01:08:39AM -0800, Saeed Mahameed wrote:
> On 27 Nov 18:59, Greg Kroah-Hartman wrote:
> > On Mon, Nov 20, 2023 at 11:06:16PM -0800, Saeed Mahameed wrote:
> > > +struct mlx5ctl_dev {
> > > + struct mlx5_core_dev *mdev;
> > > + struct miscdevice miscdev;
> > > + struct auxiliary_device *adev;
> > > + struct list_head fd_list;
> > > + spinlock_t fd_list_lock; /* protect list add/del */
> > > + struct rw_semaphore rw_lock;
> > > + struct kref refcount;
> >
> > You now have 2 different things that control the lifespan of this
> > structure. We really need some way to automatically check this so that
> > people don't keep making this same mistake, it happens all the time :(
> >
> > Please pick one structure (miscdevice) or the other (kref) to control
> > the lifespan, having 2 will just not work.
> >
>
> miscdevice doesn't handle the lifespan, open files will remain open even
> after the miscdevice was unregistered, hence we use the kref to defer the
> kfree until the last open file is closed.

miscdevice has a reference counter and a lifecycle, you can not have two
reference counted objects in the same structure and expect things to
work well.

> > Also, why a rw_semaphore? Only use those if you can prove with a
> > benchmark that it is actually faster, otherwise it's simpler to just use
> > a normal mutex (hint, you are changing the fields in the structure with
> > the read lock held, which feels very wrong, and so needs a LOT of
> > documentation, or just use a normal mutex...)
> >
>
> It is needed so we can protect against underlaying device unloading while
> miscdevice is active, we use rw semaphore since we don't care about
> synchronization between any of the fops, but we want to protect current
> active ioctls and fops from sudden mlx5ctl_remove (auxiliary_driver.remove)
> which can happen dynamically due to underlaying device removal.

Then use a normal mutex. Only use a rw lock if you can prove the
performance needs it as usually a rw lock is slower and more complex as
you then have to document stuff like:

> So here is the locking scheme:
>
> write_lock() : only on mlx5_ctl remove and mark the device is down
> via assigning NULL to mcdev->dev, to let all new readers abort and to wait
> for current readers to finish their task.
>
> read_lock(): used in all fops and ioctls, to make sure underlaying
> mlx5_core device is still active, and to prevent open files to access the
> device when miscdevice was already unregistered.
>
> I agree, this should've been documented in the code, I will add
> documentation.

Just make it simple and use a normal mutex please.

And fix up the reference counting, it shouldn't be this complex, it's
just a "simple" misc device driver :)

But before you do that, please see my other email about why not using
devlink for all of this instead.

thanks,

greg k-h