Re: Device hang when offlining a CPU due to IRQ misrouting

From: Darrick J. Wong
Date: Tue Jun 05 2007 - 13:35:27 EST


On Tue, Jun 05, 2007 at 10:23:10AM -0700, Siddha, Suresh B wrote:

> Darrick, I see a kernel bug in this area(which is already filled with bugs,
> and I am looking into ways to fix them). Are you making sure that
> between step-1 and step-2, that interrupts actually started arriving at cpu1?
>
> i.e., do step-1 and wait till the irq's start hitting at cpu1. At this point
> do step-2 and let us know if you still hit this bug?

Yes, the bug only happens after CPU1 begins to receive interrupts.

> > There exists a similar scenario. Set the IRQ affinity to a bunch of
> > CPUs, watch /proc/interrupts to see which CPU is actually servicing the
> > interrupts, then offline that CPU. The kernel does not reroute the IRQ
> > to any of the other CPUs and the device also hangs.
>
> Is this a theory or did you observe this problem happening?

Nope, I've observed this situation too.

--D

Attachment: signature.asc
Description: Digital signature