Re: [PATCH 2/2] genirq: proc: fix a procfs entry leak

From: Thomas Gleixner
Date: Fri Aug 25 2023 - 13:09:15 EST


On Fri, Aug 25 2023 at 13:01, Bartosz Golaszewski wrote:
> On Fri, Aug 25, 2023 at 10:11 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> On Fri, Aug 25 2023 at 00:36, brgl@xxxxxxxx wrote:
>> > On Thu, 24 Aug 2023 22:12:41 +0200, Thomas Gleixner <tglx@xxxxxxxxxxxxx> said:
>> > Here a GPIO device - that is also an irq chip - is unbound (this is a testing
>> > module unbound during a test-case but it can be anything else, like an I2C
>> > expander for which the driver is unloaded) while some users called
>> > request_irq() on its interrupts (this is orthogonal to gpiod_get() and doesn't
>> > take a reference to the module, so nothing is stopping us from
>> > unloading it)
>>
>> You just described the real problem in this sentence. So why are you
>> trying to cure a symptom?
>
> Honestly I'm not following. Even if we did have a way of taking the
> reference to the underlying GPIO module (I'm 99% percent sure, it's
> not possible: we're not using any of the GPIO interfaces for that,

The point is that something frees an in-use interrupt descriptor, which
is obviously wrong to begin with.

Now you go and cure the symptom of a stale procfs file, but as I said
before this is the least of the worries.

Care to look at what free_irq() does and you might figure out why this
stale file is just an easy to observe symptom, but obviously not the
real problem.

This leaves an activated interrupt around with stale pointers to the
descriptor. The interrupt could be on the flight. The associated thread
could be running. There can be resources claimed via request_irq() which
will not be freed either. There can be management operations on the
interrupt in parallel.

A driver which successfully requests an interrupt can rightfully expect
that any management operation on the interrupt, e.g. disable(),
enable(), set_*() is valid until the point it invokes free_irq().

IOW, the descriptor including the associated interrupt chips (software
representation) in the hierarchy must be at least in a consistent state
and accessible. If the underlying hardware vanished, e.g. USB
disconnect, then still the software side must be intact. Of course an
disable_irq() won't reach the hardware anymore, but that's something the
relevant driver has to handle correctly.

So just claiming that it's fine to free in-use interrupts and everything
is greate by removing the procfs entries is just wishful thinking.

You simply cannot free an in-use interrupt descriptor and I'm going to
add a big fat warning into that code to that effect.

So if it turns out that this is a general problem, then we have to sit
down and solve it from ground up.

Thanks,

tglx