Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()

From: Marc Zyngier
Date: Tue Nov 24 2020 - 11:52:40 EST


On 2020-11-23 15:45, John Garry wrote:

Hi John,

But it looks like there is more to it than that, which I'm worried is
far from non-trivial. For example, just calling irq_dispose_mapping()
for removal and then plaform_get_irq()->acpi_get_irq() second time
fails as it looks like more tidy-up is needed for removal...

Most probably. I could imagine things failing if there is any trace
of an existing translation in the ITS or in the platform-MSI layer,
for example, or if the interrupt is still active...

So this looks to be a problem I have. So if I hack the code to skip
the check in acpi_get_irq() for the irq already being init'ed, I run
into a use-after-free in the gic-v3-its driver. I may be skipping
something with this hack, but I'll ask anyway.

So initially in the msi_prepare method we setup the its dev - this is
from the mbigen probe. Then when all the irqs are unmapped later for
end device driver removal, we release this its device in
its_irq_domain_free(). But I don't see anything to set it up again. Is
it improper to have released the its device in this scenario?
Commenting out the release makes things "good" again.

Huh, that's ugly. The issue is that the device that deals with the
interrupts isn't the device that the ITS knows about (there isn't a
1:1 mapping between mbigen and the endpoint).

The mbigen is responsible for the creation of the corresponding
irqdomain, and and crucially for the "prepare" phase, which results
in storing the its_dev pointer in info->scratchpad[0].

As we free all the interrupts associated with the endpoint, we
free the its_dev (nothing else needs it at this point). On the
next allocation, we reuse the damn its_dev pointer, and we're SOL.
This is wrong, because we haven't removed the mbigen, only the
device *connected* to the mbigen. And since the mbigen can be shared
across endpoints, we can't reliably tear it down at all. Boo.

The only thing to do is to convey that by marking the its_dev as
shared so that it isn't deleted when no LPIs are being used. After
all, it isn't like the mbigen is going anywhere.

It is just that passing that information down isn't a simple affair,
as msi_alloc_info_t isn't a generic type... Let me have a think.

M.
--
Jazz is not dead. It just smells funny...