Re: [PATCH] cdx: add MSI support for CDX bus

From: Nipun Gupta
Date: Fri May 12 2023 - 10:22:39 EST




On 5/11/2023 3:59 AM, Thomas Gleixner wrote:

Nipun!

On Wed, May 10 2023 at 19:34, Nipun Gupta wrote:
On 5/10/2023 3:31 AM, Thomas Gleixner wrote:
I'm not insisting on that, but you could at least have had the courtesy
of responding to my review reply and explain to me why you want to solve
it differently and why my suggestion is not the right solution.

Alternatively you could have added that information in the changelog or
cover letter.

So in summary you ignored _all_ review comments I made, went off and did
something different and provided a slightly different useless changelog
with the extra add on of a broken Signed-off-by chain.

Feel free to ignore my reviews and the documentation which we put out
there to make collaboration feasible for both sides, but please don't be
upset when I ignore you and your patches in return.

Sincere apology for not responding to the earlier comments. Intention
was never to ignore the review comments. Appreciate your vast changes
regarding the MSI, and the patch series you shared took time to
understand (provided other things as well), and it was quite late to
reply. I understand that even in this case atleast I should have added
this as part of the cover-letter.

Fair enough. All settled.

IMHO, use-case for MSI in CDX subsystem is a bit different from per
device MSI domain. Here we are trying to create a domain per CDX
controller which is attached to a MSI controller, and all devices on a
particular CDX controller will have same mechanism of write MSI
message.

That was exactly the same assumption which PCI/MSI and other MSI
implementations made. It turned out to be the wrong abstraction.

CDX is not any different than PCI. The actual "interrupt chip" is not
part of the bus, it's part of the device and pretending that it is a bus
specific thing is just running in to the same cul-de-sac sooner than
later.

I understand your viewpoint, but would state that CDX bus is somewhat different than PCI in the sense that firmware is a controller for
all the devices and their configuration. CDX bus controller sends all the write_msi_msg commands to firmware running on RPU over the RPmsg and it is the firmware which interfaces with actual devices to pass this information to devices in a way agreed between firmware and device. The only way to pass MSI information to device is via firmware and CDX bus controller is only entity which can communicate with the firmware for this.


Also, the current CDX controller that we have added has a different
mechanism for MSI prepare (it gets requester ID from firmware).

That's not an argument, that's just an implementation detail.

In your opinion is there any advantage in moving to a per device domain
for CDX devices? We can definitely rethink the implementation of MSI in
CDX subsystem.

See above.

While talking about implementation and design. I actually got curious
and looked at CDX because I was amazed about the gazillion indirections
in that msi_write_msg() callback.

So this ends up doing:

cdx->ops->dev_configure(cdx, ...)
cdx_configure_device()
cdx_mcdi_write_msi()
cdx_mcdi_rpc_async()
kmalloc() <- FAIL #1
cdx_mcdi_rpc_async_internal()
queue_work() <- FAIL #2

#1) That kmalloc() uses GFP_ATOMIC, but this is invoked deep in the guts
of interrupt handling with locks held and interrupts disabled.

Aside of the fact that this breaks on PREEMPT_RT, such allocations
are generally frowned upon. As a consequence the kref_put()s in the
error paths of cdx_mcdi_rpc_async_internal() will blow up on RT
too.

I know that Xilinx stated publicly that they don't support RT, but
RT is not that far out to be supported in mainline and aside of that
I know for sure that quite a lot of Xilinx customers use PREEMPT_RT
nevertheless.

#2) That's actually the worse part of it and completely broken versus
device setup

probe()
cdx_msi_domain_alloc_irqs()
...
request_irq() {
...
irq_activate()
irq_chip_write_msi_msg()
...
queue_work()
...
}

enable_irq_in_device()

<- device raises interrupt and eventually uses an uninitialized
MSI message because the scheduled work has not yet completed.

That's going to be a nightmare to debug and it's going to happen
once in a blue moon out in the field.

The interrupt subsystem already can handle update mechanisms which
require sleepable context:

irq_bus_lock() and irq_bus_sync_unlock() irqchip callbacks

They were initially implemented to deal with interrupt chips which are
configured via I2C, SPI etc.

How does that work?

On entry to interrupt management functions the sequence is:

if (desc->irq_data.chip->irq_bus_lock)
desc->irq_data.chip->irq_bus_lock(...)
raw_spin_lock_irq(&desc->lock);

and on exit:

raw_spin_unlock_irq(&desc->lock);
if (desc->irq_data.chip->irq_bus_sync_unlock)
desc->irq_data.chip->irq_bus_sync_unlock(...)

irq_bus_lock() usually just acquires a mutex.

The other irqchip callbacks just cache the relevant information, but do
not execute the bus transaction because that is not possible with
desc->lock held.

In the irq_bus_sync_unlock() they execute the bus transaction with the
cached information before dropping the mutex.

So you can solve #1 and #2 with that. Your msi_write_msg() callback will
just save the message and set some internal flag that it needs to be
written out in the irq_bus_sync_unlock() callback.

See?

IIRC, there is a gap vs. interrupt affinity setting from user space,
which is irrelevant for I2C, SPI etc. configured interrupt chips as they
raise interrupt via an SoC interrupt pin and that's the entity which
does the affinity management w/o requiring I2C/SPI. IIRC I posted a
patch snippet to that effect in one of those lengthy PCI/MSI/IMS threads
because that is also required for MSI storage which happens to be in
queue memory and needs to be synchronized via some command channel. But
I can't be bothered to search for it as it's a no-brainer to fix that
up.

Thanks for this analysis and pointing the hidden crucial issues with the implementation. These needs to be fixed.

As per your suggestion, we can add Firmware interaction code in the irq_bus_sync_xx APIs. Another option is to change the cdx_mcdi_rpc_async() API to atomic synchronous API. We are evaluating both the solutions and will update the implementation accordingly.

Thanks,
Nipun


Thanks,

tglx