Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

From: Hans de Goede
Date: Thu Mar 07 2019 - 06:21:03 EST


Hi,

On 06-03-19 11:14, Thomas Gleixner wrote:
Hans,

On Wed, 6 Mar 2019, Hans de Goede wrote:
On 05-03-19 20:54, Borislav Petkov wrote:
On Tue, Mar 05, 2019 at 08:40:02PM +0100, Hans de Goede wrote:
Finger pointing at the firmware if there are multiple vendors involved
is really not going to help here. Esp. since most OEMs will just respond
with "the machine works fine with Windows"

Yes, because windoze simply doesn't report that spurious IRQ, most
likely.

So maybe we need to lower the priority of the do_IRQ error from pr_emerg
to pr_err then ? That will stop throwing the errors in the users face each
boot on distros which have chosen to set the quiet loglevel to such a level
that pr_err messages are not shown on the console (*).

Well, we rather try to understand and fix the issue.

So if Tom's theory holds, then the patch below should cure it.

Thank you for the patch, unfortunately the messages still happen
with a kernel with the patch applied:

[ 0.741479] smp: Bringing up secondary CPUs ...
[ 0.741654] x86: Booting SMP configuration:
[ 0.741655] .... node #0, CPUs: #1
[ 0.742231] TSC synchronization [CPU#0 -> CPU#1]:
[ 0.742231] Measured 3346474670 cycles TSC warp between CPUs, turning off TSC
clock.
[ 0.742231] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[ 0.321639] do_IRQ: 1.55 No irq handler for vector
[ 0.743371] #2
[ 0.321639] do_IRQ: 2.55 No irq handler for vector
[ 0.743598] #3
[ 0.321639] do_IRQ: 3.55 No irq handler for vector
[ 0.744306] #4
[ 0.321639] do_IRQ: 4.55 No irq handler for vector
[ 0.744531] #5
[ 0.321639] do_IRQ: 5.55 No irq handler for vector
[ 0.745241] #6
[ 0.321639] do_IRQ: 6.55 No irq handler for vector
[ 0.745467] #7
[ 0.321639] do_IRQ: 7.55 No irq handler for vector
[ 0.745627] smp: Brought up 1 node, 8 CPUs
[ 0.745627] smpboot: Max logical packages: 2
[ 0.745627] smpboot: Total of 8 processors activated (35133.37 BogoMIPS)

I also tried suspend/resume. In that case there are no
extra "No irq handler for vector" printed, this seems to
only trigger once per CPU on boot only.

I do get these messages during resume, but I guess these are unrelated:

[ 167.034247] ACPI: Low-level resume complete
[ 167.034247] ACPI: EC: EC started
[ 167.034247] PM: Restoring platform NVS memory
[ 167.034247] Enabling non-boot CPUs ...
[ 167.034247] x86: Booting SMP configuration:
[ 167.034247] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 167.034247] cache: parent cpu1 should not be sleeping
[ 167.034281] microcode: CPU1: patch_level=0x08101007
[ 167.034542] CPU1 is up
[ 167.034583] smpboot: Booting Node 0 Processor 2 APIC 0x2
[ 167.035347] cache: parent cpu2 should not be sleeping
[ 167.035484] microcode: CPU2: patch_level=0x08101007
[ 167.035690] CPU2 is up
[ 167.035703] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 167.036447] cache: parent cpu3 should not be sleeping
[ 167.036580] microcode: CPU3: patch_level=0x08101007
[ 167.036819] CPU3 is up
[ 167.036843] smpboot: Booting Node 0 Processor 4 APIC 0x4
[ 167.038227] cache: parent cpu4 should not be sleeping
[ 167.038384] microcode: CPU4: patch_level=0x08101007
etc.

Regards,

Hans


8<---------------------

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1642,6 +1642,7 @@ static void end_local_APIC_setup(void)
*/
void apic_ap_setup(void)
{
+ clear_local_APIC();
setup_local_APIC();
end_local_APIC_setup();
}