Re: [PATCH 2/2] genirq: proc: fix a procfs entry leak

From: Bartosz Golaszewski
Date: Mon Aug 28 2023 - 06:07:32 EST


On Sat, Aug 26, 2023 at 5:08 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Fri, Aug 25 2023 at 22:07, Bartosz Golaszewski wrote:
> > On Fri, Aug 25, 2023 at 7:08 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> The point is that something frees an in-use interrupt descriptor, which
> >> is obviously wrong to begin with.
> >>
> > It happens in irq_dispose_mapping() when the domain is not
> > hierarchical and irq_free_desc() is called in the if branch.
>
> Again. You are explaining what the callchain is. That's not interesting
> at all (I can find the places which invoke irq_free_descs() with grep).
>
> The question is WHY is irq_dispose_mapping() called when the interrupt
> in question _IS_ in use, i.e. requested and not yet released via
> free_irq().
>
> That's the underlying problem which needs to be addressed.
>
> > Now for irqs, the consumer-facing "handle" is the interrupt number. I
> > don't know what the rationale is for that as it forces us to convert
> > it to a descriptor internally everytime we use it (tree lookup!) but
>
> Perhaps because operating on an integer is not really giving you access
> to the underlying mechanisms. The tree lookup is really not an issue and
> it's a mechanism which is used all over the place in the kernel to
> translate a cookie or identifier to the internal data representation of
> a subsystem.
>
> > As I explained before in this thread - it's perfectly normal for the
> > provider of most resources in the kernel to be gone with users still
> > holding the respective handles.
>
> No. It's not most. It's only the case when the subsystem explicitely
> documents that it can handle it, which the interrupt subsystem does not.
>
> > Maybe functions in linux/interrupt.h could use some audit in order to
> > make sure they can handle this.
>
> Maybe you stop claiming that it's perfectly legit to free resources
> which are in use. It's not and the interrupt subsystem never was making
> this guarantee and never was designed for it.
>
> > It seems to me that the current infrastructure is ready as all it
> > takes is checking if irq_to_desc() returns a non-NULL value. Or I may
> > be getting it wrong and it's much more complex than that.
>
> Again:
>
> It's not just NULL checks, which exist already. It's not just a stale
> procfs file which needs to be removed. Did you actually look what
> free_irq() does?
>
> Obviously not, otherwise you might have noticed that removing the
> resources leaks:
>
> 1) The interrupt action itself
> 2) Any interrupt threads associated with the action
> 3) The procfs entry
> 4) Any resource which was allocated during request_irq()
> 5) Interrupt descriptor pointers stored separately for low level
> interrupt handling code
>
> Further it might:
>
> 1) leave hardware in an undefined state
> 2) race against an interrupt being processed concurrently
>
> How do you address this with slapping some NULL checks around? The only
> way to clean this up is invoking free_irq().
>

This is not what I meant, let me rephrase the question:

Is there any reason why whatever operations irq_free() performs could
not be done by the irqchip when it knows it will be destroyed with
irqs still in use? And then any new management operation that would be
called by the now orphaned user would just bail out? Maybe not, I'm
asking this question because I don't know and it's not obvious from
the code.

> >> So just claiming that it's fine to free in-use interrupts and everything
> >> is greate by removing the procfs entries is just wishful thinking.
> >>
> >> You simply cannot free an in-use interrupt descriptor and I'm going to
> >> add a big fat warning into that code to that effect.
> >>
> > With the above example, if our GPIO desc is analogous to the interrupt
> > number in the irq world - I'm not really sure what the role of the
> > irq_desc is. Should it be the *handle* that users get when they
> > request an irq? Maybe it is what the GPIO device is for us? Should it
> > be reference counted then?
>
> It's an internal data structure which is not accessible outside of the
> interrupt core and architecture low level code. The number is the
> identifier for the consumers to interact with the core code. That
> concept is called abstraction. What's so hard about that?
>

No need to be snarky. I do know what abstraction is. I also know that
a pointer to an opaque structure fulfills the same role without the
translation step and this is what irq_desc could be. But I get it, the
numbers have been in use for years, it's hard to change it, I don't
have an issue with that. Let's not continue this.

> >> So if it turns out that this is a general problem, then we have to sit
> >> down and solve it from ground up.
> >
> > It may very well be. I guess it will require a more detailed
> > investigation. I'd still say this patch is correct though, as the
> > self-contained function removing a procfs hierarchy should remove all
> > sub-directories and not just leak them. They don't hold any irq
> > resources anyway.
>
> This patch is not correct at all. Once again:
>
> The interrupt subsystem is not designed to have its underlying
> resources be freed when an interrupt is in-use.
>
> It's that simple. And no, your patch is not changing this. It tries to
> paper over the violation of the rule of this subsystem.
>
> The below is going to be applied ASAP to make this entirely clear. And
> yes, that's going to leak a bit more than just the procfs entry.
>
> The proper solution to this is to take a reference count on the module
> which provides the interrupt chip and allocates the interrupt domain.
> The core code has a mechanism for that. See request/free_irq().

I guess you're referring to irq_alloc_descs()? Anyway, here's a
real-life example: we have the hid-cp2112 module which drives a
GPIO-and-I2C-expander-on-a-USB-stick. I plug it in and have a driver
that requests one of its GPIOs as interrupt. Now I unplug it. How has
taking the reference to the hid-cp2112 module protected me from
freeing an irq domain with interrupts in use?

Let me focus on a possible solution then, because as it is, we have
the GPIO subsystem which claims it can survive a hot-unplug with
active users (it's not fully done yet for all corner-cases but that's
a different story...) but a GPIO chip can also act as an interrupt
controller whereas the caller can bypass most of the core GPIO code.
And now it's clear that the interrupt subsystem does not support hot
unplugging at all. And please, don't yell at me for this as this
design predates my involvement with the subsystem.

I'm thinking about two possible solutions that could be contained
within the GPIO core.

One would be to keep track of requested irqs in the GPIO-to-interrupt
glue code (where the domain is managed) and then when the GPIO chip
device is unbound, we could just go over the list of active ones and
call irq_free() on all of them. Does it even make any sense? If so: is
there any way at the moment to be notified about an irq being
requested/freed for given domain? I don't think map() and unmap()
callbacks are the right place as they are called once per mapping.
Would you be ok with adding this functionality?

Second possibility would be to keep the domain in refcounted struct
gpio_device and not struct gpio_chip where it can be freed on device
unbind. Except unlike requesting a GPIO descriptor, requesting the
interrupt directly does not increase the gpio_device's reference count
so we're back to the former. We could move the domain to the
refcounted struct and with a proper on-request notification, could
increase its refcount just like we do with the GPIO device on GPIO
request.

Do you maybe have any other suggestions by chance?

Bartosz

>
> Unfortunately it is not properly utilized by the irqdomain
> infrastructure today. But that's a fixable problem.
>
> Thanks,
>
> tglx
> ---
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index 27ca1c866f29..5382fd4e7588 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -466,6 +466,9 @@ static void free_desc(unsigned int irq)
> {
> struct irq_desc *desc = irq_to_desc(irq);
>
> + if (WARN_ON_ONCE(desc->action))
> + return;
> +
> irq_remove_debugfs_entry(desc);
> unregister_irq_proc(irq, desc);
>
>