Re: [PATCH v3] gpiolib: fix reference leaks when removing GPIO chips still in use

From: Bartosz Golaszewski
Date: Tue Aug 15 2023 - 14:37:31 EST


On Tue, Aug 15, 2023 at 4:43 PM Andy Shevchenko
<andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
>
> On Tue, Aug 15, 2023 at 03:07:50PM +0200, Linus Walleij wrote:
> > On Tue, Aug 15, 2023 at 2:57 PM Andy Shevchenko
> > <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
> > > On Tue, Aug 15, 2023 at 01:40:22PM +0200, Linus Walleij wrote:
> > > > On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko
> > > > <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
> > > > > On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote:
>
> ...
>
> > > > > > + module_put(desc->gdev->owner);
> > > > > > + gpio_device_put(desc->gdev);
> > > > >
> > > > > So, if gdev can be NULL, you will get an Oops with new code.
> > > >
> > > > I read it such that gdev->chip can be NULL, but not gdev,
> > > > and desc->gdev->owner is fine to reference?
> > >
> > > Basically the Q is
> > > "if desc is non-NULL, does it guarantee that gdev is non-NULL either?"
> >
> > gdev->desc is assigned in one single spot, which is in
> > gpiochip_add_data_with_key():
> >
> > for (i = 0; i < gc->ngpio; i++)
> > gdev->descs[i].gdev = gdev;
> >
> > It is never assigned anywhere else, so I guess yes.
> >
> > We may also ask if it is ever invalid (i.e. if desc->gdev can point to
> > junk).
> >
> > A gdev turns to junk when its reference count goes down to zero
> > and gpiodev_release() is called effectively calling kfree() on the
> > struct gpio_device *.
> >
> > But that can only happen as a result of module_put() getting
> > called, pulling the references down to zero. Which is what we
> > are discussing. The line after module_put(), desc->gdev
> > *could* be NULL.
>
> Yes.
>
> > But then we just call gpio_device_put(desc->gdev) which is
> > just a call to device_put(), which is NULL-tolerant.
>
> But gpio_device_put() does not NULL tolerant.
> So, oops in this line then.
>

No. struct gpio_device is reference counted and as long as we get the
reference counting right - the descriptor is guaranteed to hold it and
only put it when it itself is being destroyed. In other words:
desc->gdev cannot be NULL here and cannot be junk. If it is, then it's
a programming bug on our side and we do want to crash and burn so that
we can catch and fix it.

If you think more comments are needed here, feel free to add them or
I'll do it at a later time. I don't want to delay this patch any
longer as it's one of the issues we need to fix to make driver unbind
great again. Unless you see an issue with its logic, I want to queue
it tomorrow so that it gets built in next and send it upstream by the
end of this week.

Bart