RE: Common IRQ pitfall results in lockup.

Raj, Ashok (ashok.raj@intel.com)
Fri, 5 Nov 1999 08:29:48 -0800


Another one i would recommend if this is making its way to the base linux
kernel is that we should ask the irq handlers to return some value so that
the kernel could know who took this interrupt. so if 2 devices are sharing
an
interrupt, and if the kernel calls the first registered handler, and it knew
its
device interrupted, then on returning from the irq handler if we return 1 to

indicate this interrupt is consumer, we possibly dont need to send this to
all of the
registered handlers?

just a thought.
ashokr

> -----Original Message-----
> From: R.E.Wolff@BitWizard.nl [mailto:R.E.Wolff@BitWizard.nl]
> Sent: Friday, November 05, 1999 4:24 AM
> To: linux-kernel@vger.rutgers.edu; torvalds@transmeta.com;
> tytso@mit.edu; alan@lxorguk.ukuu.org.uk
> Subject: Common IRQ pitfall results in lockup.
>
>
>
> Hi Alan, Linus,
>
> Could you include the following patch that explains a common (see the
> list of affected drivers in the patch) IRQ pitfall. It also provides a
> platform for driver writers to "publish" pitfalls they encountered
> which may be important for others.
>
> Todays pitfall results in that a complete lockup can occur if a PCI
> interrupt is shared between two devices, and one of them decides to
> unload or just give up the interrupt.
>
> Alan, I'd like to discuss with you and Ted what we'll do about the 2.2
> serial driver: Leave as is (please, no!), upgrade to Teds newest
> driver with PCI serial card support (with fix), or just a minimal fix.
>
>
> Although the current 2.2. driver doesn't automatically detect any PCI
> serial cards, you can use setserial to tell it that a serial port is
> present. This can result in the serial driver triggering the bug (in
> combination with one of the other drivers that has the problem).
>
> A client reports that this problem occurred (and that my fix fixed
> it), when his SCSI card was sharing the interrupt. The cards driver
> however isn't listed as "misbehaving", so I'm not so sure that the
> problem ONLY occurs when both drivers are "misbehaving". If that's the
> case, the bug isn't (in genetic terms) "recessive" (i.e. happens only
> when BOTH drivers misbehave) as my analysis would make you believe,
> but "dominant" (i.e. can happen when either driver misbehaves).
>
> Regards,
>
> Roger Wolff.
>
>
> --------------------------------------------------------------
> -----------
>
> --- linux-2.2.13.clean/Documentation/driver_writers_readme
> Fri Nov 5 11:20:55 1999
> +++ linux/Documentation/driver_writers_readme Fri Nov 5 11:20:16 1999
> @@ -0,0 +1,93 @@
> +
> +What's this file?
> +-----------------
> +
> +This file tries to explain some of the pitfalls that driver writers
> +may encounter. All you driver writers, feel free to contribute by
> +adding stuff that you encountered, and that you think might be
> +interesting to prevent others from making the same mistake again.
> +
> +I think it's best that this is not officially "maintained" by me, as
> +only additions/corrections are acceptable, and it would be easiest to
> +just send Linus patches.
> +
> +
> +The IRQ pitfall.
> +----------------
> +
> +Older Linux kernels called the interrupt handler with the IRQ
> +number. The driver then contained an IRQ number -> devinfo table.
> +
> +The newer kernels provide a void* argument to the function where you
> +can put the devinfo pointer, so that interrupts can be shared and
> +things like that. Passing "NULL" there was done to many drivers as a
> +compatibility measure. This is however a VERY temporary (*) measure
> +that should be eradicated ASAP:
> +
> +Interrupts can be shared, the dev pointer is used as an "identifier".
> +Suppose two drivers with the old "NULL" are using the same
> (PCI, level
> +triggered) interrupt. First driver allocates interrupt, second driver
> +allocates interrupt. All is fine now. Interrupt comes, both interrupt
> +routines get called, handlers tickle their device to make the
> +interrupt stop and control is returned to whatever was interrupted.
> +
> +Then the second driver is unloaded/doesn't need the interrupt
> +anymore. It then tries to free its own interrupt routine, but the
> +kernel frees the first matching interrupt on the IRQ chain for that
> +interrupt. This happens to be the first drivers interrupt
> routine. Now
> +when some activity happens on the first device, it will
> interrupt, but
> +the second driver's interrupt routine, still on the chain will get
> +called. It cannot cause the first device to stop interrupting, so an
> +interrupt storm occurs. As soon as interrupts are re-enabled, the CPU
> +is interrupted again. Clients report this as a complete lockup.
> +
> +At the moment, the following drivers request an interrupt
> with NULL as
> +the last argument:
> +
> +net: hp.c e2100.c
> +hamradio: scc.c yam.c
> +block: hd.c xd.c amiflop.c
> +char: busmouse.c msbusmouse.c tpqic02.c wdt.c stallion.c rtc.c
> + riscom8.c h8.c qmouse.c amikeyb.c dn_keyb.c isicom.c
> +scsi: aha1542.c NCR53c406a.c ultrastor.c pas16.c
> ncr5380.c aha1740.c
> + t128.c 53c7,8xx.c in2000.c g_NCR5380.c eata_pio.c wd7000.c
> + qlogicfas.c tmscim.c psi240i.c atp870u.c
> ini9100u.c inia100.c
> + sym53c416.c
> +sound: maui.c uart6850.c
> +cdrom: cdu31a.c cm206 mcd mcdx
> +isdn: act2000_isa
> +ap1000: ddv
> +macintosh:mediabay
> +nubus: nubus?
> +acorn: cumana_1 ecoscsi oak mfmhd keyb_ps2
> +usb: mouse?
> +
> +Some of the listed drivers may only support ISA devices (act2000_isa
> +comes to mind). ISA normally only supports edge triggered
> interrupts.
> +I recommend the policy: "Don't count on it". (e.g. PNP devices have
> +a setup option to use level or edge triggered interrupts. I
> don't know
> +wether that's for real... )
> +
> +The best way to correct this is to pass along the "dev"
> structure that
> +describes your device. If (like the serial driver) you don't request
> +the IRQ multiple times for multiple devices hanging on one interrupt,
> +I recommend that you pass IRQ_table + IRQnum. This is an address
> +inside your drivers "address space", and should be very
> unique. If all
> +else fails (e.g. your driver may support just one of your
> devices) you
> +could pass in the address of your interrupt routine or something like
> +that.
> +
> +(*) Ok, it's been going on for several years now, but lets treat it
> +with some urgency, before things really start blowing up in
> our faces:
> +with more and more devices in computers people are going to
> see shared
> +interrupts more and more...
> +
> +-------------------------------------------------------------
> -------------
> +
> +Created on november 5th 1999 by R.E.Wolff@BitWizard.
> +Added: What's this file? -- REW
> +Added: IRQ request hints. -- REW
> +
> +Changed: ....
> +Added: .....
> +
>
>
>
> --
> ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ **
> +31-15-2137555 **
> *-- BitWizard writes Linux device drivers for any device you
> may have! --*
> "I didn't say it was your fault. I said I was going to blame
> it on you."
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/