Re: [PATCH] gpiolib: cdev: fix NULL-pointer dereferences

From: Kent Gibson
Date: Fri Nov 25 2022 - 19:39:45 EST


On Fri, Nov 25, 2022 at 10:03:06PM +0100, Bartosz Golaszewski wrote:
> On Fri, Nov 25, 2022 at 6:56 PM Andy Shevchenko
> <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, Nov 25, 2022 at 05:48:02PM +0100, Bartosz Golaszewski wrote:
> > > On Fri, Nov 25, 2022 at 5:24 PM Kent Gibson <warthog618@xxxxxxxxx> wrote:
> >
> > ...
> >
> > > Then at the subsystem level, the GPIO device struct would need a lock
> > > that would be taken by every user-space operation AND the code
> > > unregistering the device so that we don't do what you described (i.e.
> > > if there's a thread doing a read(), then let's wait until it returns
> > > before we drop the device).
> >
> > It's called a reference counting, basically you need to get device and then
> > put when it makes sense.
> >
>
> Andy: I am aware of struct device reference counting but this isn't
> it. You can count references all you want, but when I disconnect my
> CP2112, the USB bus calls gpiochip_remove(), struct gpio_chip * inside
> struct gpio_device is set to NULL and while the underlying struct
> device itself is still alive, the GPIO chip is no longer usable.
>
> Reference counting won't help because the device is no longer there,
> so this behavior is correct but there's an issue with user-space still
> being able to hold certain resources and we need to make sure that
> when it tries to use them, we return an error instead of crashing.
>
> I think that a good solution is to make sure, we cannot set gdev->gc
> to NULL as long as there are user-space operations in progress. After
> all, it's better to try to send a USB request to an unplugged device
> than to dereference a NULL pointer. To that end, we could have a
> user-space lock that would also be taken by gpiochip_remove().
>

This is basically the answer I was hoping for - that there is some
barrier in place to prevent chip removal while an ioctl is active.
Then the check makes total sense - it is ensuring that the chip wasn't
removed before the ioctl began and the barrier went up.

On the other end, the caller of gpiochip_remove() needs to be prepared
to gracefully fail calls on the chip until gpiochip_remove() returns.
You would hope that is already the case...

> But this is still a per-subsystem solution. Most other subsystems
> suffer from the same issue.
>

Does that prevent us addressing the problem in gpio until a more general
solution comes along?

Anyway, I'm basically ok with your patch as a first step, as it greatly
reduces the chances of triggering the fault, but it is only a band-aid
over a larger issue and a more complete solution would be preferable.
Without that, highlight in the checkin comment that it is not a complete
fix.

Cheers,
Kent.