Re: do_IRQ: 0.165 No irq handler for vector (irq -1)

From: Torsten Kaiser
Date: Tue Feb 02 2010 - 14:57:01 EST


On Tue, Feb 2, 2010 at 7:40 PM, Suresh Siddha <suresh.b.siddha@xxxxxxxxx> wrote:
> On Mon, 2010-02-01 at 20:53 -0800, Eric W. Biederman wrote:
>> > It might be that the silicon implements MSI incorrectly and ends up
>> > sending out invalid MSI packets under certain circumstances.  The
>> > silicon hasn't changed for quite some time now and back when it came
>> > out MSI wasn't too popular and I don't think SIMG's proprietary
>> > drivers use it, so it's quite possible that the feature simply is
>> > broken.  Is there any specific reason why you want to enable MSI
>> > support?  It's not like MSI brings any actual benefit when the
>> > compatibility hardware is already there.

19: 34618 3 2 4862 IO-APIC-fasteoi
sata_sil24, bttv0, Bt87x audio
[ 6.038918] IRQ 19/bttv0: IRQF_DISABLED is not guaranteed on shared IRQs

The interrupt that the sata_sil24 is currently using is shared, so I
thought that switching this to MSI might be a good idea.
And I wanted to test a new feature. ;-)

>> It also seems possible that some of the recent irq handling changes
>> missed something.
>
> No Eric. This particular report is with 2.6.33-rc kernels and also only
> when MSI support for sata_sil24 is enabled. Recent irq handling changes
> are all in -tip tree and getting tested. So this sounds like a different
> problem specific to this HW's MSI capabilities.

Just to repeat this so not get this information lost:
MSI seems to work an this system.
The drivers radeon (X300), HDA intel (onboard sound from the MCP55
chipset) and tg3 (two BCM5754) all work without any problems.

>> Usually the message "No irq handler for vector (irq -1)" means that the irq
>> was delivered to a cpu that was not ready for it.  I see that vector 165
>> is being delivered on all of the different cpus with vector 165,
>> and that you are getting interrupts delivered most of the time.
>
> Also I see this in the original report:
>
> On Sun, 2010-01-31 at 05:02 -0800, Torsten Kaiser wrote:
>> What is really strange: The vector 165 is stable. It never changed
>> even if I deactivate all other drivers in the kernel config (that
>> changes the MSI IRQ for sata_sil24 from 29 to 28!) or if I switch off
>> CONFIG_SPARSE_IRQ. In the kernel with the reduced number of drivers
>> the maximum vector that gets used in __assign_irq_vector is only 137.
>
> It looks like the HW under certain conditions is generating interrupts
> with wrong vector (165), especially when the __assign_irq_vector() never
> allocated the vector 165 (and hence we never setup the vector to irq
> mapping for this vector on any cpu). Torsten, can you please apply the
> appended patch and boot with "apic_phys" boot parameter and see if it
> makes any difference?

I tried the patch and the message from do_IRQ is gone, but reading the
file still fails with the same errors from libata.
(Earlier tests with writing a large file to this disk also failed with
timeouts, but never trigger the do_IRQ error)

I added a diff between the dmesg from the testrun with your patch to
the previous run at the end of the mail.

>> This smells like the initialization problems I was seeing in another
>> thread.  Suresh?
>
> No. Initialization problems in another thread happens in a small window
> during cpu online (in logical flat mode, we are setting up vector to irq
> mappings for the AP a little late after we have enabled interrupts).
> Here the problem is not actually triggered during cpu on-lining.

FWIW: # CONFIG_HOTPLUG_CPU is not set

I don't use suspend/resume on that system, so I never enabled CPU
hotplug in the .config.

Thanks for looking at this.

Torsten


The changes in dmesg from you patch:
1,2c1,2
< x Linux version 2.6.33-rc6 (root@treogen) (gcc version 4.4.2 (Gentoo
4.4.2 p1.0) ) #1 SMP Sat Jan 30 10:38:39 CET 2010
< x Command line: root=/dev/sdc1 console=ttyS0,115200 console=tty1
sata_sil24.msi=1 radeon.modeset=1 raid=noautodetect apic=debug
---
> x Linux version 2.6.33-rc6 (root@treogen) (gcc version 4.4.2 (Gentoo 4.4.2 p1.0) ) #2 SMP Tue Feb 2 20:22:21 CET 2010
> x Command line: root=/dev/sdc1 console=ttyS0,115200 console=tty1 sata_sil24.msi=1 radeon.modeset=1 raid=noautodetect apic=debug apic_phys
61a62
> x Setting APIC routing to physical flat.
130a132
> x Setting APIC routing to physical flat.
159c161
< x Kernel command line: root=/dev/sdc1 console=ttyS0,115200
console=tty1 sata_sil24.msi=1 radeon.modeset=1 raid=noautodetect
apic=debug
---
> x Kernel command line: root=/dev/sdc1 console=ttyS0,115200 console=tty1 sata_sil24.msi=1 radeon.modeset=1 raid=noautodetect apic=debug apic_phys
163,164c165,166
< x Node 0: aperture @ a7f2000000 size 32 MB
< x Aperture beyond 4GB. Ignoring.
---
> x Node 0: aperture @ 20000000 size 32 MB
> x Aperture pointing to e820 RAM. Ignoring.
202c204
< x Setting APIC routing to flat
---
> x Setting APIC routing to physical flat
234,235c236,237
< x ... lapic delta = 1249998
< x ... PM-Timer delta = 357954
---
> x ... lapic delta = 1249989
> x ... PM-Timer delta = 357951
237,241c239,243
< x ..... delta 1249998
< x ..... mult: 53687005
< x ..... calibration result: 1999996
< x ..... CPU clock speed is 2599.9959 MHz.
< x ..... host bus clock speed is 199.9996 MHz.
---
> x ..... delta 1249989
> x ..... mult: 53686618
> x ..... calibration result: 1999982
> x ..... CPU clock speed is 2599.9751 MHz.
> x ..... host bus clock speed is 199.9982 MHz.
248c250
< x Total of 4 processors activated (20800.14 BogoMIPS).
---
> x Total of 4 processors activated (20799.96 BogoMIPS).
430,431c432,433
< x ... APIC ICR: 000008fd
< x ... APIC ICR2: 08000000
---
> x ... APIC ICR: 000000fd
> x ... APIC ICR2: 03000000
437,438c439,440
< x ... APIC TMICT: 0001e847
< x ... APIC TMCCT: 000174b3
---
> x ... APIC TMICT: 0001e846
> x ... APIC TMCCT: 000185ee
462,476c464,478
< x 01 00F 0 0 0 0 0 1 1 31
< x 02 00F 0 0 0 0 0 1 1 30
< x 03 00F 0 0 0 0 0 1 1 33
< x 04 00F 0 0 0 0 0 1 1 34
< x 05 00F 1 0 0 0 0 1 1 35
< x 06 00F 0 0 0 0 0 1 1 36
< x 07 00F 0 0 0 0 0 1 1 37
< x 08 00F 0 0 0 0 0 1 1 38
< x 09 00F 0 1 0 0 0 1 1 39
< x 0a 00F 1 0 0 0 0 1 1 3A
< x 0b 00F 1 0 0 0 0 1 1 3B
< x 0c 00F 0 0 0 0 0 1 1 3C
< x 0d 00F 0 0 0 0 0 1 1 3D
< x 0e 00F 0 0 0 0 0 1 1 3E
< x 0f 00F 0 0 0 0 0 1 1 3F
---
> x 01 000 0 0 0 0 0 0 0 31
> x 02 000 0 0 0 0 0 0 0 30
> x 03 000 0 0 0 0 0 0 0 33
> x 04 000 0 0 0 0 0 0 0 34
> x 05 000 1 0 0 0 0 0 0 35
> x 06 000 0 0 0 0 0 0 0 36
> x 07 000 0 0 0 0 0 0 0 37
> x 08 000 0 0 0 0 0 0 0 38
> x 09 000 0 1 0 0 0 0 0 39
> x 0a 000 1 0 0 0 0 0 0 3A
> x 0b 000 1 0 0 0 0 0 0 3B
> x 0c 000 0 0 0 0 0 0 0 3C
> x 0d 000 0 0 0 0 0 0 0 3D
> x 0e 000 0 0 0 0 0 0 0 3E
> x 0f 000 0 0 0 0 0 0 0 3F
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/