Re: contention on long-held spinlock

From: Bryan Donlan
Date: Fri Aug 19 2011 - 15:26:14 EST


2011/8/19 Ortwin Glück <odi@xxxxxx>:
> Hi,
>
> I have observed a bad behaviour that is likely caused by spinlocks in the
> qla2xxx driver. This is a QLogic Fibre Channel storage driver.

Please CC the relevant maintainers when reporting driver bugs (I'm
adding them in this reply); it will help make sure the right people
notice. Maintainer addresses can be found in the MAINTAINERS file at
the root of the linux source tree.

What version of the kernel are you using? It would also help to
provide dmesg output from when the problem is occurring, if anything
out of the ordinary can be found there (if you've already rebooted,
check /var/log/kern.log - or wherever your distribution puts the
kernel log)

> Somehow the attached SAN had a problem and became unresponsive. Many
> processes queued up waiting to write to the device. The processes were doing
> nothing but wait, but system load increased to insane values (40 and above
> on a 4 core machine). The system was very sluggish and unresponsive, making
> it very hard and slow to see what actually was the problem.
>
> I didn't run an indepth analysis, but this is my guess: I see that qla2xxx
> uses spinlocks to guard the HW against concurrent access. So if the HW
> becomes unresponsive all waiters would busy spin and burn resources, right?
> Those spinlocks are superfast as long as the HW responds well, but become a
> CPU burner once the HW becomes slow.
>
> I wonder if spinlocks could be made aware of such a situation and relax.
> Something like if spinning for more than 1000 times, perform a simple
> backoff and sleep. A spinlock should never spin busy for several seconds,
> right?

That's what mutexes are for. Note, however, that interrupt handlers
cannot use mutexes as they cannot sleep, nor can they wait for lock
holders which may themselves sleep.

Also note that holding spinlocks for a long time is more likely to
result in lockups than a slowdown - a CPU attempting to grab a
spinlock disables migration and preemption, so on your four CPU
system, four processes waiting on spinlocks is enough to completely
lock up the system (unless you're using the real-time branch's kernel,
which converts most spinlocks to mutexes).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/