Re: [PROBLEM] Frequently get "irq 31: nobody cared" when passing through 2x GPUs that share same pci switch via vfio

From: Matthew Ruffell
Date: Tue Oct 05 2021 - 01:03:08 EST


Hi Alex,

Have you had an opportunity to have a look at this a bit deeper?

On 16/09/21 4:32 am, Alex Williamson wrote:
>
> Adding debugging to the vfio-pci interrupt handler, it's correctly
> deferring the interrupt as the GPU device is not identifying itself as
> the source of the interrupt via the status register. In fact, setting
> the disable INTx bit in the GPU command register while the interrupt
> storm occurs does not stop the interrupts.
>
> The interrupt storm does seem to be related to the bus resets, but I
> can't figure out yet how multiple devices per switch factors into the
> issue. Serializing all bus resets via a mutex doesn't seem to change
> the behavior.
>
> I'm still investigating, but if anyone knows how to get access to the
> Broadcom datasheet or errata for this switch, please let me know.

We have managed to obtain a recent errata for this switch, and it
doesn't
mention any interrupt storms with nested switches. What would
I be looking for
in the errata? I cannot share our copy, sorry.



Is there anything that we can do to help?



Thanks,

Matthew