Re: [PATCH 2/2] genirq: proc: fix a procfs entry leak

From: Bartosz Golaszewski
Date: Tue Aug 29 2023 - 08:25:40 EST


On Tue, Aug 29, 2023 at 11:11 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Tue, Aug 29 2023 at 08:26, Bartosz Golaszewski wrote:
> > On Mon, Aug 28, 2023 at 11:44 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> That's the module which provides the interrupt domain and hid-whatever
> >> is the one which requests the interrupt, no?
> >>
> > Not at all! This is what I said: "we have the hid-cp2112 module which
> > drives a GPIO-and-I2C-expander-on-a-USB-stick". Meaning: the
> > hid-cp2112 module registers a USB driver for a GPIO expander on a
> > stick. This GPIO chip is the interrupt controller here. It's the USB
> > stick that provides interrupts for whatever else needs them (in real
> > life: it can be an IIO device on the I2C bus which signals some events
> > over the GPIOs). The user can get the interrupt number using the
> > gpiod_to_irq() function. It can be unplugged at any moment and module
> > references will not stop the USB bus from unbinding it.
>
> Sorry for my confusion, but this all is confusing at best.
>
> So what you are saying is that the GPIO driver, which creates the
> interrupt domain is unbound and that unbind destroys the interrupt
> domain, right? IOW, the wonderful world of plug and pray.
>
> Let's look at the full picture again.
>
> USB -> USB-device
> |----------- GPIO
> |------------I2C ---------- I2C-device
> (hid-cp2112 driver) (i2c-xx-driver)
>
> i2x-xx-driver is the one which requests the interrupt from
> hid-cp2112-GPIO, right?
>

Yes! Sorry if I was not being clear about it.

> So when the USB device disconnects, then something needs to tell the
> i2c-xx-driver that the I2C interface is not longer available, right?
>
> IOW, the unbind operation needs the following notification and teardown
> order:
>
> 1) USB core notifies hid-cp2112
>
> 2) hid-cp2112 notifies i2c-xx-driver
>
> 3) i2c-xx-driver mops up and invokes free_irq()
>
> 4) hid-cp2112 removes the interrupt domain
>
> But it seems that you end up with a situation where the notification of
> the i2c-xx-driver is either not happening or the driver in question is
> simply failing to mop up and free the requested interrupt.
>

Yes, there's no notification of any kind. It's a common problem
unfortunately across different subsystems. We have hot-unpluggable
consumers using resources that don't support it (like interrupts in
this example).

> As a consequence you want to work around it by mopping up the requested
> interrupts somewhere else.
>

The approach I'm proposing - and that we implement in GPIO - is
treating the "handle" to the resource as what's often called in
programming - a weak reference. The resource itself is released not by
the consumer, but the provider. The consumer in turn can get the weak
reference from the provider and has to have some way of converting it
to a strong one for the duration of any of the API calls. It can be
implemented internally with a mutex, spinlock, an RCU read section or
otherwise (in GPIO we're using rw_semaphores but I'm working on
migrating to SRCU in order to protect the functions called from
interrupt context too which is missing ATM). If for any reason the
provider vanishes, then the next API call will fail. If it vanishes
during a call, then we'll wait for the call to exit before freeing the
resources, even if the underlying HW is already gone (the call in
progress may fail, that's alright).

For interrupts it would mean that when the consumer calls
request_irq(), the number it gets is a weak reference to the irq_desc.
For any management operation we lock irq_desc. If the domain is
destroyed, irq_descs get destroyed with it (after all users leave the
critical section). Next call to any of the functions looks up the irq
number and sees it's gone. It fails or silently returns depending on
the function (e.g. irq_free() would have to ignore the missing
lookup).

But I'm just floating ideas here.

> It's not clear to me what you are trying to achieve here.
>
> If this is meant as a hardening effort to catch such issues and let
> the kernel gracefully "survive", then I'm all ears.
>

IMO if more subsystems adapted the above approach then we'd have less
surprises when their resources are used in an unexpected way. I'd call
it "part of a broader hardening effort". :)

Bart

> If this is just meant to paper over the shortcomings of subsystems or
> driver implemtations then I'm not interested at all.
>
> Thanks,
>
> tglx
>
>
>
>