printk: Cleanup and softlockup avoidance

From: Jan Kara
Date: Tue Dec 17 2013 - 09:48:46 EST


Hello,

this is another piece of the printk softlockup saga series. Let me first
remind the problem:

Currently, console_unlock() prints messages from kernel printk buffer to
console while the buffer is non-empty. When serial console is attached,
printing is slow and thus other CPUs in the system have plenty of time
to append new messages to the buffer while one CPU is printing. Thus the
CPU can spend unbounded amount of time doing printing in console_unlock().
This is especially serious since vprintk_emit() calls console_unlock()
with interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Also during artificial stress testing SATA disk
disappears from the system because its interrupts aren't served for too
long.
---

Since my previous attempts to fix softlockups in printk under heavy load met
some resistance, I've decided to try a different approach - do not let
CPU out of the console_unlock() loop until there's someone else to take over
the printing. However that approach will need some changes in how
smp_call_function_any() works (as I need to make sure someone will come
sooner rather than later to rescue the printing CPU) and I don't want to
make already contentious series even worse. So to move at least by a bit,
let me post the following three patches that improve the situation by not
calling console_unlock() with interrupts disabled. I think they make sense
on their own (the third patch is actually an independent cleanup).

What do you guys think?

Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/