Re: Serial related oops

From: Michael K. Edwards
Date: Mon Feb 19 2007 - 19:04:58 EST


On 2/19/07, Russell King <rmk+lkml@xxxxxxxxxxxxxxxx> wrote:
I think something else is going on here. I think you're getting
an interrupt for the UART, and another interrupt is also pending.

Correct. An interrupt for the other UART on the same IRQ.

When the UART interrupt is handled, it is masked at the interrupt
controller, and the CPU mask is dropped.

Correct.

The second interrupt comes in, and when you go to disable that
source, you inadvertently re-enable the UART interrupt, despite it
still being serviced.

Incorrect. An attempt has been made to service the interrupt using
the only ISR currently in the chain for that IRQ -- the ISR for the
first UART. That attempt was not successful, and when __do_irq
unmasks the interrupt source preparatory to exiting interrupt context,
__irq_svc is dispatched anew.

This leads to the UART interrupt again triggering an IRQ.

Right. The _second_ UART's interrupt. There's another problem with
these UARTs having to do with the implementor's inability to read and
follow a bog-standard twenty-year-old spec without asking software to
fix up corner cases, but that's another backtrace for another day.

Please show your interrupt controller (mask, unmask, mask_ack)
handling functions corresponding with the interrupt which your
UART is connected to.

Don't have 'em handy; I'll be happy to post them when I do, perhaps
later today. I would hope they're pretty generic, though; it's a
Feroceon core pretending to be an ARM926EJ-S, hooked to the usual
half-assed Marvell imitation of an ARM licensed functional block.
Trust me for the moment, it's the same IRQ line.

This shows that you don't actually have an understanding of the Linux
kernel boot, especially in respect of serial devices. At boot, devices
are detected and initialised to a safe state, where they will not
spuriously generate interrupts.

Sorry, 'taint so. Not unless the chip support droid has put the right
stuff in arch/arm/mach-foo. LKML is littered with the fall-out of the
decision to trust whoever jumped to main() to have left the hardware
in a sane state. If you don't enjoy this sort of forensics (which I
for one do not, especially not when there is a project deadline
looming and a Heisenbug starts firing 9 times out of 10), you might
consider systematically installing ISRs that know how to shut
everything up before turning on any interrupt sources at all.

As I said, this is not going to happen overnight, and is not even
particularly in the economic interest of people who get paid by the
hour to wear bringup wizard hats. That category currently includes
me, but I am intensely bored with this game and aspire to greater
things.

When a userspace program opens a serial port, which can only happen
once the kernel boot has completed (ergo, devices have been initialised
and placed in a safe state) the interrupts are claimed, and enabled
at the source.

As you can see from the console dump I posted (which begins with
"Freeing init memory: 92K" and ends with do_exit -> init -> sys_open,
which is obviously sys_open("/dev/console")), this happens long before
userspace comes into the picture. Our 8250.c has some nasty hacks in
it but otherwise this call chain is from a very nearly vanilla
2.6.16.recent.

We've already worked around this on our board, and the whole kit and
kaboodle will eventually be posted to linux-arm-kernel in tidy patches
when my client lets me spend billable hours on it (immediately after
the damn thing passes its first functional test, long before it
ships). I'm not asking for anyone's help except in the
let's-all-help-one-another spirit. I'm trying to help with root cause
analysis of Frederik's (Jose's?) fandango on core. If it's not
relevant, my apologies; and although it goes without saying, I salute
you for both the serial driver and the ARM port.

Now please take a second look at the backtrace before toasting me
lightly again. Mmm'kay? Oh, and by the way -- is there an Alt-SysRq
equivalent on an ARM serial console?

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/