Re: [PATCH V3 2/5] misc: mlx5ctl: Add mlx5ctl misc driver

From: Jason Gunthorpe
Date: Wed Nov 29 2023 - 08:02:10 EST


On Wed, Nov 29, 2023 at 09:20:32AM +0000, Greg Kroah-Hartman wrote:
> On Wed, Nov 29, 2023 at 01:08:39AM -0800, Saeed Mahameed wrote:
> > On 27 Nov 18:59, Greg Kroah-Hartman wrote:
> > > On Mon, Nov 20, 2023 at 11:06:16PM -0800, Saeed Mahameed wrote:
> > > > +struct mlx5ctl_dev {
> > > > + struct mlx5_core_dev *mdev;
> > > > + struct miscdevice miscdev;
> > > > + struct auxiliary_device *adev;
> > > > + struct list_head fd_list;
> > > > + spinlock_t fd_list_lock; /* protect list add/del */
> > > > + struct rw_semaphore rw_lock;
> > > > + struct kref refcount;
> > >
> > > You now have 2 different things that control the lifespan of this
> > > structure. We really need some way to automatically check this so that
> > > people don't keep making this same mistake, it happens all the time :(
> > >
> > > Please pick one structure (miscdevice) or the other (kref) to control
> > > the lifespan, having 2 will just not work.
> > >
> >
> > miscdevice doesn't handle the lifespan, open files will remain open even
> > after the miscdevice was unregistered, hence we use the kref to defer the
> > kfree until the last open file is closed.
>
> miscdevice has a reference counter and a lifecycle, you can not have two
> reference counted objects in the same structure and expect things to
> work well.

This second refcount is hidden well:

struct miscdevice {
int minor;
const char *name;
const struct file_operations *fops;
struct list_head list;
struct device *parent;
struct device *this_device;
const struct attribute_group **groups;
const char *nodename;
umode_t mode;
};

You have slammed us in the other thread for doing a poor review job
here, but none of us see what you are seeing. Can you explain better?

> > write_lock() : only on mlx5_ctl remove and mark the device is down
> > via assigning NULL to mcdev->dev, to let all new readers abort and to wait
> > for current readers to finish their task.
> >
> > read_lock(): used in all fops and ioctls, to make sure underlaying
> > mlx5_core device is still active, and to prevent open files to access the
> > device when miscdevice was already unregistered.
> >
> > I agree, this should've been documented in the code, I will add
> > documentation.
>
> Just make it simple and use a normal mutex please.

A normal mutex would make the entire ioctl interface single threaded,
this is not desirable.

> But before you do that, please see my other email about why not using
> devlink for all of this instead.

We've been over this already, the devlink discussion is about some
configuration stuff. It has never been suggested to cover the debug
interface. This series is primarily about debug, the devlink thing is
a distraction to main point.

Jason