contention on long-held spinlock

From: Ortwin Glück
Date: Fri Aug 19 2011 - 05:26:15 EST


Hi,

I have observed a bad behaviour that is likely caused by spinlocks in the qla2xxx driver. This is a QLogic Fibre Channel storage driver.

Somehow the attached SAN had a problem and became unresponsive. Many processes queued up waiting to write to the device. The processes were doing nothing but wait, but system load increased to insane values (40 and above on a 4 core machine). The system was very sluggish and unresponsive, making it very hard and slow to see what actually was the problem.

I didn't run an indepth analysis, but this is my guess: I see that qla2xxx uses spinlocks to guard the HW against concurrent access. So if the HW becomes unresponsive all waiters would busy spin and burn resources, right? Those spinlocks are superfast as long as the HW responds well, but become a CPU burner once the HW becomes slow.

I wonder if spinlocks could be made aware of such a situation and relax. Something like if spinning for more than 1000 times, perform a simple backoff and sleep. A spinlock should never spin busy for several seconds, right?

Thanks,

Ortwin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/