Re: [PATCH 2/2] genirq: proc: fix a procfs entry leak

From: Bartosz Golaszewski
Date: Fri Aug 25 2023 - 16:08:44 EST


On Fri, Aug 25, 2023 at 7:08 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Fri, Aug 25 2023 at 13:01, Bartosz Golaszewski wrote:
> > On Fri, Aug 25, 2023 at 10:11 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> On Fri, Aug 25 2023 at 00:36, brgl@xxxxxxxx wrote:
> >> > On Thu, 24 Aug 2023 22:12:41 +0200, Thomas Gleixner <tglx@xxxxxxxxxxxxx> said:
> >> > Here a GPIO device - that is also an irq chip - is unbound (this is a testing
> >> > module unbound during a test-case but it can be anything else, like an I2C
> >> > expander for which the driver is unloaded) while some users called
> >> > request_irq() on its interrupts (this is orthogonal to gpiod_get() and doesn't
> >> > take a reference to the module, so nothing is stopping us from
> >> > unloading it)
> >>
> >> You just described the real problem in this sentence. So why are you
> >> trying to cure a symptom?
> >
> > Honestly I'm not following. Even if we did have a way of taking the
> > reference to the underlying GPIO module (I'm 99% percent sure, it's
> > not possible: we're not using any of the GPIO interfaces for that,
>
> The point is that something frees an in-use interrupt descriptor, which
> is obviously wrong to begin with.
>

It happens in irq_dispose_mapping() when the domain is not
hierarchical and irq_free_desc() is called in the if branch.

> Now you go and cure the symptom of a stale procfs file, but as I said
> before this is the least of the worries.
>
> Care to look at what free_irq() does and you might figure out why this
> stale file is just an easy to observe symptom, but obviously not the
> real problem.
>
> This leaves an activated interrupt around with stale pointers to the
> descriptor. The interrupt could be on the flight. The associated thread
> could be running. There can be resources claimed via request_irq() which
> will not be freed either. There can be management operations on the
> interrupt in parallel.
>
> A driver which successfully requests an interrupt can rightfully expect
> that any management operation on the interrupt, e.g. disable(),
> enable(), set_*() is valid until the point it invokes free_irq().
>

Whether by "valid" you mean "always succeeds" or merely just "doesn't
crash the kernel", it basically means that the consumer that requested
the resource (irq, GPIO or otherwise) can always call any of the
consumer-facing calls and the handle it got from the provider (GPIO
descriptor, irq number) remains valid even after the device backing
that handle is gone - in which case the call either returns 0 and acts
like nothing happened or returns an appropriate error code (most
likely -ENODEV).

Or even a combination of both, like we do in the GPIO subsystem where
we emit a warning to kernel log but we still return 0.

> IOW, the descriptor including the associated interrupt chips (software
> representation) in the hierarchy must be at least in a consistent state
> and accessible. If the underlying hardware vanished, e.g. USB
> disconnect, then still the software side must be intact. Of course an
> disable_irq() won't reach the hardware anymore, but that's something the
> relevant driver has to handle correctly.
>

Absolutely agree! This is precisely what we do in the GPIO subsystem.
Except that we migrated from using a GPIO numberspace similar to the
irq numbers in favor of using GPIO descriptors which are simple
structs containing some meta-data about the GPIO request and - most
importantly - a reference to the GPIO device backing the descriptor.

That GPIO device (still talking about the software representation) has
two layers and is composed of a struct that embeds struct device and
is reference counted via the kobject refs inside struct device as well
as a second struct called the GPIO chip which on the other hand can be
destroyed at any time (most likely when the provider device is
unbound). struct gpio_device is guaranteed to exist for as long as
anyone references it even after struct gpio_chip was freed. Whenever
the user calls any of the consumer-facing APIs, we dereference the
descriptor, take the gpio_device pointer and see if the chip is still
there. If it's not - we return 0 as mentioned above.

Now for irqs, the consumer-facing "handle" is the interrupt number. I
don't know what the rationale is for that as it forces us to convert
it to a descriptor internally everytime we use it (tree lookup!) but
from what you're saying, if the domain backing it is destroyed and
with it, the interrupt descriptors, then management calls may end up
triggering use-after-free bugs. It seems about right as not all of
them check if the mapping can be found. I may have been lucky to never
trigger it as I was only using request_irq() and free_irq().

As I explained before in this thread - it's perfectly normal for the
provider of most resources in the kernel to be gone with users still
holding the respective handles. Maybe functions in linux/interrupt.h
could use some audit in order to make sure they can handle this. It
seems to me that the current infrastructure is ready as all it takes
is checking if irq_to_desc() returns a non-NULL value. Or I may be
getting it wrong and it's much more complex than that.

> So just claiming that it's fine to free in-use interrupts and everything
> is greate by removing the procfs entries is just wishful thinking.
>
> You simply cannot free an in-use interrupt descriptor and I'm going to
> add a big fat warning into that code to that effect.
>

With the above example, if our GPIO desc is analogous to the interrupt
number in the irq world - I'm not really sure what the role of the
irq_desc is. Should it be the *handle* that users get when they
request an irq? Maybe it is what the GPIO device is for us? Should it
be reference counted then?

> So if it turns out that this is a general problem, then we have to sit
> down and solve it from ground up.
>

It may very well be. I guess it will require a more detailed
investigation. I'd still say this patch is correct though, as the
self-contained function removing a procfs hierarchy should remove all
sub-directories and not just leak them. They don't hold any irq
resources anyway.

Are you fine with the first, cleanup patch in this series BTW?

Bartosz

> Thanks,
>
> tglx