"impossible" spinlock "wrong CPU" problem with custom device driver

From: Timm Korte
Date: Wed Jul 08 2009 - 18:55:26 EST


I'm trying to understand a spinlog bug in a kernel module (device driver).
I have a spinlock that is uses in the actual hardware interrupt handler
as well as in a seperate kernel thread doing the real work via a work
queue. The first one uses the spinlock with spin_lock() and
spin_unlock(), while the thread uses spin_lock_irqsave() and
spin_unlock_irqrestore().
On rare occasions (can't reproduce on purpose), i get a spinlog debug
message about wrong cpu on _raw_spin_unlock when called from the kernel
thread.

This is the source (for the kernel_thread) that runs into the problem:

static int my_irqthread_function(void *ptr) {
struct my_dev *mydev = ptr;

daemonize(MY_NAME "%02x", mydev->mynum);
allow_signal(SIGTERM);
while (!wait_event_interruptible(mydev->irqthread_wait,
atomic_read(&mydev->irqthread_pending_count))) {
do {
uint8_t my_irq_pending = 0;
unsigned long iflags;

spin_lock_irqsave(&mydev->irq_pending_lock, iflags);
my_irq_pending = mydev->irq_pending;
mydev->irq_pending = 0;
spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags);

// handle irqs
if (my_irq_pending & INT_IPAC1) {
my_handle_interrupt(&mydev->mydev[IPAC1]);
}
...
// continue if the pending count still is != 0 after decrementing
} while (!atomic_dec_and_test(&mydev->irqthread_pending_count));
}

mydev->irqthread = 0;
complete_and_exit(&mydev->irqthread_exit, 0);
}

The error (SPIN_BUG with kernel panic on my SMP box) happens on the
"spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags);" - but i
really can't figure out, how the thread could be moved to another cpu,
while holding the lock and only doing two assignment operations.

The only thing i could think of, is that it might have something to do
with the enabled sigterm signal - even though the module wasn't being
unloaded at the time the bug occured.

System is FC4 based with a 2.6.17 kernel (can't change).

So I'm sort of out of ideas and hope someone here has an idea, what
might have gone wrong here.

Timm
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/