Re: [PATCH v3 2/3] zram: fix deadlock with sysfs attribute usage and driver removal

From: Luis Chamberlain
Date: Tue Jun 22 2021 - 15:57:09 EST


On Tue, Jun 22, 2021 at 08:05:12PM +0200, Greg KH wrote:
> On Tue, Jun 22, 2021 at 10:27:12AM -0700, Luis Chamberlain wrote:
> > On Tue, Jun 22, 2021 at 07:16:34PM +0200, Greg KH wrote:
> > > On Tue, Jun 22, 2021 at 09:32:08AM -0700, Luis Chamberlain wrote:
> > > > On Tue, Jun 22, 2021 at 09:45:39AM +0200, Greg KH wrote:
> > > > > On Mon, Jun 21, 2021 at 04:36:34PM -0700, Luis Chamberlain wrote:
> > > > > > @@ -2048,13 +2048,19 @@ static ssize_t hot_add_show(struct class *class,
> > > > > > {
> > > > > > int ret;
> > > > > >
> > > > > > + if (!try_module_get(THIS_MODULE))
> > > > > > + return -ENODEV;
> > > > > > +
> > > > >
> > > > > You can not increment/decrement your own module's reference count and
> > > > > expect it to work properly, as it is still a race.
> > > >
> > > > The goal here is to prevent an rmmod call if this succeeds. If it
> > > > succeeds then any subsequent rmmod will fail. Can you explain how
> > > > this is still racy?
> > >
> > > {sigh}
> > >
> > > What happens if the driver core is just about to call hot_add_show() and
> > > the module is removed from the system. It then calls to the memory
> > > location that hot_add_show() was previously at, but now that is not a
> > > valid pointer to code, and boom.
> >
> > The new kobject_get() on patch 3/3 ensures that the device will be up
> > throughout the entire life of the store call, and thus prevent the
> > code being executed being removed, no?
>
> I do not know, I no longer remember what is in that patch at the moment
> as it is long-gone from my queue.

It was the changes *you* recommended, a generic way to ensure the
lifetime of the derefernce is valid. I had used bdgrab()/bdget() and you
suggested we generalize it with the kobject_get() for the device and a
bus get. With that change, I confirm that the device will still be
present during the lifetime of the sysfs knobs call.

> Also, if the device will be "up" for the whole lifetime, why do you need
> to increment the module reference count?

The goal is to prevent a deadlock. The lifetime of the device is not
an issue in this deadlock case, the issue is a race with module removal
and that code path using a lock which is also used on a sysfs knob.

Luis