Re: [PATCH 2/2] genirq: proc: fix a procfs entry leak

From: Bartosz Golaszewski
Date: Mon Aug 28 2023 - 15:04:08 EST


On Mon, Aug 28, 2023 at 2:41 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Mon, Aug 28 2023 at 12:06, Bartosz Golaszewski wrote:
> > On Sat, Aug 26, 2023 at 5:08 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >>
> >> How do you address this with slapping some NULL checks around? The only
> >> way to clean this up is invoking free_irq().
> >>
> >
> > This is not what I meant, let me rephrase the question:
> >
> > Is there any reason why whatever operations irq_free() performs could
> > not be done by the irqchip when it knows it will be destroyed with
> > irqs still in use? And then any new management operation that would be
> > called by the now orphaned user would just bail out? Maybe not, I'm
> > asking this question because I don't know and it's not obvious from
> > the code.
>
> The irqchip can do nothing and it is not directly involved in freeing
> the interrupt descriptor. The entity, which allocated the interrupt
> descriptors is responsible for that. That's in most modern cases the
> interrupt domain.
>
> It might be possible to free the actions in a teardown operation, but
> that needs a lot of thoughts vs. concurrency and life time rules. A
> simple 'let's invoke free_irq()' does not cut it.
>
> >> The proper solution to this is to take a reference count on the module
> >> which provides the interrupt chip and allocates the interrupt domain.
> >> The core code has a mechanism for that. See request/free_irq().
> >
> > I guess you're referring to irq_alloc_descs()? Anyway, here's a
> > real-life example: we have the hid-cp2112 module which drives a
> > GPIO-and-I2C-expander-on-a-USB-stick. I plug it in and have a driver
> > that requests one of its GPIOs as interrupt. Now I unplug it. How has
> > taking the reference to the hid-cp2112 module protected me from
> > freeing an irq domain with interrupts in use?
>
> request_irq() does not care which module request the interrupt. It
> always takes a refcount on irq_desc::owner. That points to the module
> which created the interrupt domain and/or allocated the descriptors.
>
> IOW, this needs a mechanism to store the module which creates the
> interrupt domain somewhere in the domain itself and use it when
> allocating interrupt descriptors. So in your case this would take a
> refcount on the GPIO module.
>

This is still not complete. In the above example, the USB bus can
still unbind the GPIO device that created the domain on hot-unplug,
triggering its cleanup routines (.remove(), devres chain) and
destroying the domain and keeping the reference to the hid-cp2112
module will not help it. This is why I suggested tracking the irq
requests and freeing them in said cleanup path.

Bartosz