[RFC][PATCHv6 02/12] printk: introduce printing kernel thread

From: Sergey Senozhatsky
Date: Mon Dec 04 2017 - 08:48:59 EST


printk() is quite complex internally and, basically, it does two
slightly independent things:
a) adds a new message to a kernel log buffer (log_store())
b) prints kernel log messages to serial consoles (console_unlock())

while (a) is guaranteed to be executed by printk(), (b) is not, for a
variety of reasons, and, unlike log_store(), it comes at a price:

1) console_unlock() attempts to flush all pending kernel log messages
to the console and it can loop indefinitely.

2) while console_unlock() is executed on one particular CPU, printing
pending kernel log messages, other CPUs can simultaneously append new
messages to the kernel log buffer.

3) the time it takes console_unlock() to print kernel messages also
depends on the speed of the console -- which may not be fast at all.

4) console_unlock() is executed in the same context as printk(), so
it may be non-preemptible/atomic, which makes 1)-3) dangerous.

As a result, nobody knows how long a printk() call will take, so
it's not really safe to call printk() in a number of situations,
including atomic context, RCU critical sections, interrupt context,
etc.

To avoid lockups we, clearly, need to break out of console_unlock() printing
loop before watchdog detects the lockup condition. This patch introduces a
'/sys/module/printk/parameters/offloading_enabled' sysfs param, which enables
and disables such functionality by offloading printing duty to another task.
The printing offloading is happening from console_unlock() function and,
briefly, looks as follows:
if a process spends in console_unlock() more than half of watchdog's
threshold value it breaks out of printing loop and unlocks console_sem.

Since nothing guarantees that there will another process sleeping on the
console_sem or calling printk() on another CPU simultaneously, the patch
also introduces auxiliary kernel thread - printk_kthread, the main
purpose of which is to take over printing duty. The workflow is, thus,
turns into:
if a process spends in console_unlock() more than half of watchdog's
threshold value it wakes up printk_kthread, breaks out of printing loop and
unlocks the console_sem.

The wakeup part is also a bit tricky, since scheduler is in position to
decide that printk_kthread should run on the very same CPU with the process
that is currently doing printing. This means that offloading potentially
may never take place. That's why we try to play games with printk_kthread
affinity mask and basically want to wake it up on a foreign CPU, so
printing take over has more chances to succeed.

There are, however, cases when we can't (or should not) offload. For
example, we can't call into the scheduler from panic(), because this
may deadlock. Therefore printk() has a new 'emergency mode': in this
mode we never attempt to offload printing to printk_kthread. There are
places, where printk switches to printk_emergency mode automatically:
for instance, once a EMERG log level message appears in the log buffer;
in others - user must explicitly forbid offloading. For that purpose
we provide two new functions:

-- printk_emergency_begin()
Disables printk offloading. All printk() calls (except for deferred
printk) will attempt to lock the console_sem and, if successful,
flush kernel log messages.

-- printk_emergency_end()
Enables printk offloading.

Offloading is not possible yet, it will be enabled in a later patch.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
---
Documentation/admin-guide/kernel-parameters.txt | 7 +
include/linux/console.h | 3 +
kernel/printk/printk.c | 179 ++++++++++++++++++++++--
3 files changed, 175 insertions(+), 14 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 28467638488d..fa40d68db39d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3241,6 +3241,13 @@
printk.time= Show timing data prefixed to each printk message line
Format: <bool> (1/Y/y=enable, 0/N/n=disable)

+ printk.offloading_enabled=
+ Enable/disable print out offloading to a dedicated
+ printk kthread. When enabled a task will wake up the
+ printk kthread every watchdog lockup threshold seconds.
+ Format: <bool> (1/Y/y=enable, 0/N/n=disable)
+ default: disable.
+
processor.max_cstate= [HW,ACPI]
Limit processor to maximum C-state
max_cstate=9 overrides any DMI blacklist limit.
diff --git a/include/linux/console.h b/include/linux/console.h
index b8920a031a3e..07005db4c788 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -187,6 +187,9 @@ extern bool console_suspend_enabled;
extern void suspend_console(void);
extern void resume_console(void);

+extern void printk_emergency_begin(void);
+extern void printk_emergency_end(void);
+
int mda_console_init(void);
void prom_con_init(void);

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 81c19e51a4a4..f4e84f83bff5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -48,6 +48,7 @@
#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
+#include <linux/kthread.h>

#include <linux/uaccess.h>
#include <asm/sections.h>
@@ -400,15 +401,16 @@ DEFINE_RAW_SPINLOCK(logbuf_lock);
printk_safe_exit_irqrestore(flags); \
} while (0)

-#ifdef CONFIG_PRINTK
/*
- * Delayed printk version, for scheduler-internal messages:
+ * Used both for deferred printk version (scheduler-internal messages)
+ * and printk_kthread control.
*/
#define PRINTK_PENDING_WAKEUP 0x01
#define PRINTK_PENDING_OUTPUT 0x02

static unsigned long printk_pending;

+#ifdef CONFIG_PRINTK
DECLARE_WAIT_QUEUE_HEAD(log_wait);
/* the next printk record to read by syslog(READ) or /proc/kmsg */
static u64 syslog_seq;
@@ -444,6 +446,118 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf = __log_buf;
static u32 log_buf_len = __LOG_BUF_LEN;

+static struct task_struct *printk_kthread;
+static cpumask_var_t printk_offload_cpus;
+/*
+ * We can't call into the scheduler (wake_up() printk kthread) during
+ * suspend/kexec/etc. This temporarily switches printk to old behaviour.
+ */
+static atomic_t printk_emergency __read_mostly;
+/*
+ * Enable/disable printk_kthread permanently. Unlike `oops_in_progress'
+ * it doesn't go back to 0.
+ */
+static bool offloading_enabled;
+
+module_param_named(offloading_enabled, offloading_enabled, bool, 0644);
+MODULE_PARM_DESC(offloading_enabled,
+ "enable/disable print out offloading to printk kthread");
+
+static inline bool printk_offloading_enabled(void)
+{
+ if (system_state != SYSTEM_RUNNING || oops_in_progress)
+ return false;
+
+ return printk_kthread && offloading_enabled &&
+ atomic_read(&printk_emergency) == 0;
+}
+
+/*
+ * This disables printing offloading and instead attempts
+ * to do the usual console_trylock()->console_unlock().
+ *
+ * Note, this does not wait for printk_kthread to stop (if it's
+ * already printing logbuf messages).
+ */
+void printk_emergency_begin(void)
+{
+ atomic_inc(&printk_emergency);
+}
+EXPORT_SYMBOL_GPL(printk_emergency_begin);
+
+/* This re-enables printk_kthread offloading. */
+void printk_emergency_end(void)
+{
+ atomic_dec(&printk_emergency);
+}
+EXPORT_SYMBOL_GPL(printk_emergency_end);
+
+static inline int offloading_threshold(void)
+{
+ /* Default threshold value is 10 seconds */
+ return 10;
+}
+
+/*
+ * Under heavy printing load or with a slow serial console (or both)
+ * console_unlock() can stall CPUs, which can result in soft/hard-lockups,
+ * lost interrupts, RCU stalls, etc. Therefore we attempt to limit the
+ * amount of time a process can print from console_unlock().
+ *
+ * This function must be called from 'printk_safe' context under
+ * console_sem lock.
+ */
+static inline bool should_handoff_printing(u64 printing_start_ts)
+{
+ static struct task_struct *printing_task;
+ static u64 printing_elapsed;
+ u64 now = local_clock();
+
+ if (!printk_offloading_enabled()) {
+ /* We are in emergency mode, disable printk_kthread */
+ if (current == printk_kthread)
+ return true;
+ return false;
+ }
+
+ /* A new task - reset the counters. */
+ if (printing_task != current) {
+ printing_task = current;
+ printing_elapsed = 0;
+ return false;
+ }
+
+ if (time_after_eq64(now, printing_start_ts))
+ printing_elapsed += now - printing_start_ts;
+
+ /* Once we offloaded to printk_ktread - keep printing */
+ if (current == printk_kthread)
+ return false;
+
+ /* Shrink down to seconds and check the offloading threshold */
+ if ((printing_elapsed >> 30LL) < offloading_threshold())
+ return false;
+
+ /*
+ * We try to set `printk_kthread' CPU affinity to any online CPU
+ * except for this_cpu. Because otherwise `printk_kthread' may be
+ * scheduled on the same CPU and offloading will not take place.
+ */
+ cpumask_copy(printk_offload_cpus, cpu_online_mask);
+ cpumask_clear_cpu(smp_processor_id(), printk_offload_cpus);
+
+ /*
+ * If this_cpu is the one and only online CPU, then try to wake up
+ * `printk_kthread' on it.
+ */
+ if (cpumask_empty(printk_offload_cpus))
+ cpumask_set_cpu(smp_processor_id(), printk_offload_cpus);
+
+ set_cpus_allowed_ptr(printk_kthread, printk_offload_cpus);
+ wake_up_process(printk_kthread);
+ return true;
+}
+
/* Return log buffer address */
char *log_buf_addr_get(void)
{
@@ -1752,6 +1866,15 @@ asmlinkage int vprintk_emit(int facility, int level,

printed_len = log_output(facility, level, lflags, dict, dictlen, text, text_len);

+ /*
+ * Emergency level indicates that the system is unstable and, thus,
+ * we better stop relying on wake_up(printk_kthread) and try to do
+ * a direct printing.
+ */
+ if (level == LOGLEVEL_EMERG)
+ offloading_enabled = false;
+
+ set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
logbuf_unlock_irqrestore(flags);

/* If called from the scheduler, we can not call up(). */
@@ -1869,6 +1992,13 @@ static size_t msg_print_text(const struct printk_log *msg,
bool syslog, char *buf, size_t size) { return 0; }
static bool suppress_message_printing(int level) { return false; }

+void printk_emergency_begin(void) {}
+EXPORT_SYMBOL_GPL(printk_emergency_begin);
+
+void printk_emergency_end(void) {}
+EXPORT_SYMBOL_GPL(printk_emergency_end);
+
+static bool should_handoff_printing(u64 printing_start_ts) { return false; }
#endif /* CONFIG_PRINTK */

#ifdef CONFIG_EARLY_PRINTK
@@ -2149,9 +2279,18 @@ void console_unlock(void)
static u64 seen_seq;
unsigned long flags;
bool wake_klogd = false;
- bool do_cond_resched, retry;
+ bool do_cond_resched, retry = false;
+ bool do_handoff = false;

if (console_suspended) {
+ /*
+ * Here and later, we need to clear the PENDING_OUTPUT bit
+ * in order to avoid an infinite loop in printk_kthread
+ * function when console_unlock() cannot flush messages
+ * because we suspended consoles. Someone else will print
+ * the messages from resume_console().
+ */
+ clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
up_console_sem();
return;
}
@@ -2180,6 +2319,7 @@ void console_unlock(void)
* console.
*/
if (!can_use_console()) {
+ clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
console_locked = 0;
up_console_sem();
return;
@@ -2189,6 +2329,7 @@ void console_unlock(void)
struct printk_log *msg;
size_t ext_len = 0;
size_t len;
+ u64 printing_start_ts = local_clock();

printk_safe_enter_irqsave(flags);
raw_spin_lock(&logbuf_lock);
@@ -2208,7 +2349,7 @@ void console_unlock(void)
len = 0;
}
skip:
- if (console_seq == log_next_seq)
+ if (do_handoff || console_seq == log_next_seq)
break;

msg = log_from_idx(console_idx);
@@ -2240,9 +2381,12 @@ void console_unlock(void)
stop_critical_timings(); /* don't trace print latency */
call_console_drivers(ext_text, ext_len, text, len);
start_critical_timings();
+
+ /* Must be called under printk_safe */
+ do_handoff = should_handoff_printing(printing_start_ts);
printk_safe_exit_irqrestore(flags);

- if (do_cond_resched)
+ if (!do_handoff && do_cond_resched)
cond_resched();
}
console_locked = 0;
@@ -2256,14 +2400,18 @@ void console_unlock(void)
up_console_sem();

/*
- * Someone could have filled up the buffer again, so re-check if there's
- * something to flush. In case we cannot trylock the console_sem again,
- * there's a new owner and the console_unlock() from them will do the
- * flush, no worries.
+ * Someone could have filled up the buffer again, so re-check
+ * if there's something to flush. In case when trylock fails,
+ * there's a new owner and the console_unlock() from them will
+ * do the flush, no worries.
*/
- raw_spin_lock(&logbuf_lock);
- retry = console_seq != log_next_seq;
- raw_spin_unlock(&logbuf_lock);
+ if (!do_handoff) {
+ raw_spin_lock(&logbuf_lock);
+ retry = console_seq != log_next_seq;
+ if (!retry)
+ clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
+ raw_spin_unlock(&logbuf_lock);
+ }
printk_safe_exit_irqrestore(flags);

if (retry && console_trylock())
@@ -2687,8 +2835,11 @@ late_initcall(printk_late_init);
#if defined CONFIG_PRINTK
static void wake_up_klogd_work_func(struct irq_work *irq_work)
{
- if (test_and_clear_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
- /* If trylock fails, someone else is doing the printing */
+ if (test_bit(PRINTK_PENDING_OUTPUT, &printk_pending)) {
+ /*
+ * If trylock fails, someone else is doing the printing.
+ * PRINTK_PENDING_OUTPUT bit is cleared by console_unlock().
+ */
if (console_trylock())
console_unlock();
}
--
2.15.1