Re: common_interrupt: No irq handler for vector

From: Shuah Khan
Date: Mon Dec 14 2020 - 17:42:44 EST


On 12/14/20 3:28 PM, Thomas Gleixner wrote:
Shuah,

On Mon, Dec 14 2020 at 13:57, Shuah Khan wrote:
On 12/14/20 1:41 PM, Thomas Gleixner wrote:
Here is the processor and BIOS info:
AMD Ryzen 7 4700G with Radeon Graphics
LENOVO ThinkCentre Embedded Controller -[O4ZCT12A-1.12]-
LENOVO ThinkCentre BIOS Boot Block Revision 1.1C


I am bisecting to isolate. Same issue on all stables 5.4, 4.19 and
so on. If it is BIOS problem I would expect to see it on 5.10-rc7
and wouldn't have expected to start seeing it 5.9.9.

Can you provide some more details, e.g. dmesg please?


__common_interrupt: 1.55 No irq handler for vector
__common_interrupt: 2.55 No irq handler for vector
__common_interrupt: 3.55 No irq handler for vector
__common_interrupt: 4.55 No irq handler for vector
__common_interrupt: 5.55 No irq handler for vector
__common_interrupt: 6.55 No irq handler for vector
__common_interrupt: 7.55 No irq handler for vector
__common_interrupt: 8.55 No irq handler for vector
__common_interrupt: 9.55 No irq handler for vector
__common_interrupt: 10.55 No irq handler for vector

This _IS_ the AGESA BIOS bug.

No. It's perfectly correct in the MSI code. See further down.

if (IS_ERR_OR_NULL(this_cpu_read(vector_irq[cfg->vector])))
this_cpu_write(vector_irq[cfg->vector], VECTOR_RETRIGGERED);


I am asking about inconsistent comments and the actual message as the
comment implies if vector is VECTOR_UNUSED state, this message won't
be triggered in common_interrupt. Based on that my read is the comment
might be wrong if the code is correct as you are saying.

The comment says:

>> * anyway. If the vector is unused, then it is marked so it won't
>> * trigger the 'No irq handler for vector' warning in
>> * common_interrupt().

If the vector is unused, then it is _marked_ so ....

See the messages above.

This code has absolutely nothing to do with these messages and this code
marks the vector RETRIGGERED so the warning cannot happen if the MSI
migration causes this spurious vector to be emitted. That marking is
there _because_ the migration triggered the warning occasionally which
is unavoidable due the silliness of hardware.

The problem is that the buggy BIOS causes vector 55 which is the legacy
X86 interrupt 7 to be sent to the secondary CPUs 1-10 when they come up
the first time during boot. This has been reported to death already and
AMD confirmed that it is an AGESA BIOS bug and that it is fixed with
AGESA BIOS version 1.1.8.0.

The reason why it shows up now might be timing related, nothing else.


Thank you for confirming. I will save myself the bisect time and look
for BIOS update.

thanks,
-- Shuah