Re: 2.6.24-rc8 hangs at mfgpt-timer

From: Jordan Crouse
Date: Thu Jan 17 2008 - 20:17:23 EST


On 18/01/08 00:39 +0100, Arnd Hannemann wrote:
> Jordan Crouse schrieb:
> > On 17/01/08 23:52 +0100, Arnd Hannemann wrote:
> >
> > <snip>
> >
> >>> Hmmm - not sure whats happening here. I wonder if we're stuck in an
> >>> interrupt storm of some sort as soon as you register the interrupt handler.
> >>> But I would think that whatever was causing the interrupt storm would be
> >>> running well before we hit setup_irq(), and you would be recording "nobody
> >>> cared" interrupts left and right.
> >> Interesting thing is that it hangs not in setup_irq() but later, right
> >> after printing the newline of the printk.
> >
> > THat makes me think interrupt storm even more.
> >
> >>> The thing that scares me is that the TinyBIOS seems to know that we want
> >>> to use the MFGPT timers, and I wonder if they did anything behind the scenes
> >>> to "help us out" even though we didn't ask for it.
> >>>
> >>> I don't know how easy it would be for you - but can you try reading
> >>> MSRs 0x51400020 - 0x51400023? If you need a command line app to do it,
> >>> you can use rdmsr from here:
> >>>
> >>> http://wiki.laptop.org/go/Flashing_LinuxBIOS_on_A-Test_Boards
> >> MSR register 0x51400020 => b7:ef:5f:f4:bf:d1:95:68
> >> MSR register 0x51400021 => b7:fd:1f:f4:bf:cf:5a:d8
> >> MSR register 0x51400022 => b7:f3:bf:f4:bf:f5:fb:a8
> >> MSR register 0x51400023 => b7:fb:9f:f4:bf:fd:d9:f8
> >
> > Hmmm - those look wrong. Is /dev/cpu/0/msr there? The applet on the
> > wiki has a bug that doesn't check for it.
> I'm sorry, I should have checked: I didn't execute rdmsr as root.
> The correct ones:
>
> MSR register 0x51400020 => 00:00:00:00:00:00:0f:00
> MSR register 0x51400021 => 00:00:00:00:04:00:00:00
> MSR register 0x51400022 => 00:00:00:00:00:00:00:00
> MSR register 0x51400023 => 00:00:00:00:00:0c:ba:90

Okay - those are sane. Those are the IRQ routing MSRs - each nibble is an
IRQ for something in the southbridge - since 7 doesn't appear, a rogue
interrupt isn't likely. Also, [0:31] in 0x51400022 are the MFGPT interrupts
specifically, and they are 0. The timer tick code would set the first nibble
to 7 (in a sane system).

So, everything _seems_ okay from the hardware side. The next step would be
to comment out the setup_irq() function and write a C app or something to
verify that all the MFGPT registers are set up like we expect them to be.
However, I think that maybe we can accomplish the same thing with the
forthcoming watchdog timer, since it will prove the MFGPT is behaving
well (though it doesn't use the interrupt wrapper, but one step at a time).

Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/