Re: [PATCH 2/2] genirq: proc: fix a procfs entry leak

From: Thomas Gleixner
Date: Tue Aug 29 2023 - 18:30:33 EST


On Tue, Aug 29 2023 at 14:24, Bartosz Golaszewski wrote:
> On Tue, Aug 29, 2023 at 11:11 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> > On Mon, Aug 28, 2023 at 11:44 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> So when the USB device disconnects, then something needs to tell the
>> i2c-xx-driver that the I2C interface is not longer available, right?
>>
>> IOW, the unbind operation needs the following notification and teardown
>> order:
>>
>> 1) USB core notifies hid-cp2112
>>
>> 2) hid-cp2112 notifies i2c-xx-driver
>>
>> 3) i2c-xx-driver mops up and invokes free_irq()
>>
>> 4) hid-cp2112 removes the interrupt domain
>>
>> But it seems that you end up with a situation where the notification of
>> the i2c-xx-driver is either not happening or the driver in question is
>> simply failing to mop up and free the requested interrupt.
>>
>
> Yes, there's no notification of any kind.

I'm not buying that.

usb disconnect
...
cp2112_remove()
i2c_del_adapter()
i2c_unregister_device(client)
...
device_unregister()
device_del()
bus_notify() // Mechanism #1
i2c_device_remove()
if (dev->remove)
dev->remove()
...
device_unbind_cleanup()
devres_release_all() // Mechanism #2

gpiochip_remove()

There are very well notifications to the drivers about unplug of a
device. Otherwise this would end up in a complete disaster and a lot
more stale data and state than just a procfs file or a requested
interrupt.

So the mechanisms are there, no?

If this is just about the problem that some device driver writers fail
to implement them correctly, then yes it makes sense to have a last
resort fallback which cleans them up and emits a big fat warning.

Making this a programming model would be beyond wrong.

> It's a common problem unfortunately across different subsystems. We
> have hot-unpluggable consumers using resources that don't support it
> (like interrupts in this example).

All hotpluggable consumers have at least one mechanism to mop up the
resources they allocated. There are a lot of resources in the kernel
which do not clean themself up magically.

So what's so special about interrupts? They are not any different from a
pointer which is registered at some entity and the device driver writer
forgets to unregister it, but the underlying resource is freed. That's
even worse than the leaked interrupt and cannot be magically undone at
all.

Whatever you try, you can't turn driver programming into a task which
can be accomplished w/o brains.

> For interrupts it would mean that when the consumer calls
> request_irq(), the number it gets is a weak reference to the irq_desc.

Interrupt numbers are weak references by definition. request_irq() does
not return an interrupt number, it returns success or fail. The
interrupt number is handed to request_irq(), no?

The entities which hand out the interrupt number are a complete
different story. But that number is from their perspective a weak
reference too.

> For any management operation we lock irq_desc.

That's required anyway, but irq_desc::lock is not a sufficient
protection against a teardown race.

> If the domain is destroyed, irq_descs get destroyed with it

Interrupts consist of more than just an interrupt descriptor. If you
care to look at the internals, then the descriptor is the last entity
which goes away simply because all other related resources hang off the
interrupt descriptor.

So they obviously need to be mopped up first and trying to mop up
requested interrupts at the point where the interrupt descriptor is
freed is way too late.

In fact they need to mopped up first, which is obvious from the setup
ordering as everything underneath must be in place already before
request_irq() can succeed. The only sensible ordering for teardown is
obviously the reverse of setup, but that's not specific to interrupts at
all.

> (after all users leave the critical section).

Which critical section? Interrupts have more than just the
irq_desc::lock critical section which needs to be left.

> Next call to any of the functions looks up the irq number and sees
> it's gone. It fails or silently returns depending on the function
> (e.g. irq_free() would have to ignore the missing lookup).

That's what happens today already.

The missing bit is the magic function which mops up when the driver
writer blew it up. But that's far from a 'put a trivial free_irq() call
somewhere'.

First of all we have too many interrupt mechanisms ranging from the
linux 0.1 model with static interrupt spaces to hierarchical interrupt
domains.

- Almost everything which is interrupt domain based (hierarchical or
not) can probably be "addressed" by some definition of "addressed"
because there are only a few places in the core code which are
involved in releasing individual or domain wide resources.

But see below.

- The incarnations which do not use interrupt domains are way harder
because the teardown of the interrupt resources is not happening in
the core code. It's happening at the driver side and the core is
only involved to release the interrupt descriptor. But that's too
late.

The good news about those is that probably the vast majority of the
instances is built in, mostly legacy and not pluggable.

So lets assume that anything which is not interrupt domain based is not
relevant, which reduces the problem space significantly. But the
remaining space is still non-trivial.

1) Life-time

Interrupt descriptors are RCU freed but that's mostly for external
usage like /proc/interrupts

Still most of the core code relies on the proper setup/teardown
order, so there would be quite some work to do vs. validating that
an interrupt descriptor is consistent at all hierarchy levels.

That's going to be interesting vs. interfaces which can be invoked
from pretty much any context (except NMI).

irq_desc::lock is not the panacea here because it obviously can
only be held for real short truly atomic sections. But quite some
of the setup/teardown at all levels requires sleepable context.

2) Concurrency

Protection against concurrent interrupt handler execution is
covered in free_irq() as it fully synchronizes.

That's the least of my worries.

Whatever the invalidation mechanism might be, pretty much every
usage site of irq_to_desc() and any usage site of interrupt
descriptors which are stored outside of the maple_tree for
performance reasons need to be audited.

The validation has to take into account whether that's an requested
descriptor or an unused one.

So no, neither removing some procfs entry at the wrong point nor
slapping some variant of free_irq() into random places is going to cut
it.

Your idea of tracking request_irq()/free_irq() at some subsystem level
is not going to work either simply because it requires that such muck is
sprinkled all over the place.

I completely agree that the interrupt core code does not have enough
places which emit a prominent warning when the rules are violated.

So sure, in some way or the other a "fix" by some definition of "fix"
could be implemented, but I'm really not convinced that's the right way
to waste^Wspend our time with.

Especially not because interrupt handling is a hot-path and any
life-time/conncurrency validation mechanism will introduce overhead no
matter what.

Thanks,

tglx