Re: Serial related oops

From: Michael K. Edwards
Date: Mon Feb 19 2007 - 21:18:07 EST


On 2/19/07, Russell King <rmk+lkml@xxxxxxxxxxxxxxxx> wrote:
This can't happen because when __do_irq unmasks the interrupt source,
the CPU mask is set, thereby preventing any further interrupt exceptions
being taken. This is done precisely to prevent this situation happening.

If you are seeing recursion for the same interrupt (two or more stack
frames containing asm_do_IRQ for that very same IRQ) then your interrupt
handling is buggy, plain and simple.

Imaginable. I'll look at the mask/unmask code. Thanks.

I don't doubt that it is on the same IRQ line - I have such setups here
and it works perfectly - multiple 8250 UARTs connected to a single
level-triggered interrupt input which also happens to be shared with
a SCSI host chip as well. Absolutely no problems.

Can you do me a favor? In the sys_open("/dev/console") path, turn on
the right bits in that second uart's IER, then insert a sleep in
request_irq or something (wherever seems best based on that
backtrace), and feed enough characters into the second UART during
that sleep to generate an IRQ. Do you not get the same soft lockup?

I still say that your understanding is completely flawed. Moreover,
you haven't read what I've said about the ordering of initialisation,
the stress on when we disable interrupts for the ports, etc.

Well, all I can say is that that's a real backtrace and it shouldn't
be hard to reproduce if it's anything other than a broken interrupt
controller or broken code called by the __do_irq postamble. I don't
see any platform-provided unmask routines in that backtrace, but maybe
it got inlined; I'll go back and check.

You're actually *not* helping. You're causing utter confusion through
misunderstanding, but it seems you're not open to the possibility that
your understanding is flawed.

Still open, though it's a pity you're more interested in my flawed
understanding that in the possibility that the kernel could be
systematically made more robust against hardware bugs and coding
errors by the simple expedient of putting all the ISRs in before
turning on any IRQ that might be shared. Or are you telling me that's
already been done? (Yes, I am aware that this interacts
entertainingly with hot-plug PCI. Yes, I am aware that there is a
limit to how much software can fix stupid hardware. But surely there
is room for an emergency IRQ suppressor to let chip initialization
code kick in and force the hardware to a known state.)

I'm offering to look through your code and point you at the source of
your issue for free. Please don't throw that offer away without first
considering that maybe I have a clue about what's going on here.

I appreciate that offer, and I hope to take advantage of it as soon as
I have the source code at my fingertips (not just the chat log where I
recorded the backtrace).

... which showed the port being opened well after system initialisation
of devices, including all serial ports - including disabling of their
interrupt source at the IER, has been completed.

Now that you mention it, the backtrace I sent is the
serial8250_startup one, not the serial8250_init one. Sorry, this
one's probably an artifact of brain damage specific to this UART. I
need to dig through a different account to find the init-path example;
but in either case, we're getting a new interrupt during the __do_irq
postamble. If you're telling me that that shouldn't happen, what
should the backtrace for a soft lockup due to a stuck level-triggered
IRQ look like on ARM?

Yes, and it's the same for any serial console with functioning break
support. You'll find it in Documentation/sysrq.txt, though it does
misleadingly say "PC style standard serial ports only" whereas the
reality is "where possible".

Thank you very much; this will help me get to the bottom of some other
chip-support nastiness on this device.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/