[PATCH printk v2 9/9] printk: Avoid non-panic CPUs flooding ringbuffer

From: John Ogness
Date: Mon Nov 06 2023 - 16:08:00 EST


Commit 13fb0f74d702 ("printk: Avoid livelock with heavy printk
during panic") introduced a mechanism to silence non-panic CPUs
if too many messages are being dropped. Aside from trying to
workaround the livelock bugs of legacy consoles, it was also
intended to avoid losing panic messages. However, if non-panic
CPUs are flooding the ringbuffer, then reacting to dropped
messages is too late.

To avoid losing panic CPU messages, the tracking needs to occur
when non-panic CPUs are storing messages. If non-panic CPUs have
filled approximately 1/4 the ringbuffer, they need to be
silenced to ensure the ringbuffer has ample space available for
the panic CPU messages.

Rather than trying to come up with an accurate heuristic to
measure the size used by non-panic CPUs, simply restrict them
to 1/4 the possible ringbuffer descriptors. In practice this
will end up being around 1/3 the ringbuffer size, which still
leaves ample space for the panic CPU messages.

Signed-off-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
---
kernel/printk/printk.c | 33 ++++++++++++++++++---------------
1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index cb99c854a648..9ac7d50c2f18 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2315,6 +2315,8 @@ asmlinkage int vprintk_emit(int facility, int level,
const struct dev_printk_info *dev_info,
const char *fmt, va_list args)
{
+ static atomic_t panic_noise_count = ATOMIC_INIT(0);
+
int printed_len;
bool in_sched = false;

@@ -2322,8 +2324,22 @@ asmlinkage int vprintk_emit(int facility, int level,
if (unlikely(suppress_printk))
return 0;

- if (unlikely(suppress_panic_printk) && other_cpu_in_panic())
- return 0;
+ if (other_cpu_in_panic()) {
+ if (unlikely(suppress_panic_printk))
+ return 0;
+
+ /*
+ * The messages on the panic CPU are the most important. If
+ * non-panic CPUs are generating many messages, the panic
+ * messages could get lost. Limit the number of non-panic
+ * messages to approximately 1/4 of the ringbuffer.
+ */
+ if (atomic_inc_return_relaxed(&panic_noise_count) >
+ (1 << (prb->desc_ring.count_bits - 2))) {
+ suppress_panic_printk = 1;
+ return 0;
+ }
+ }

if (level == LOGLEVEL_SCHED) {
level = LOGLEVEL_DEFAULT;
@@ -2799,8 +2815,6 @@ void console_prepend_dropped(struct printk_message *pmsg, unsigned long dropped)
bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
bool is_extended, bool may_suppress)
{
- static int panic_console_dropped;
-
struct printk_buffers *pbufs = pmsg->pbufs;
const size_t scratchbuf_sz = sizeof(pbufs->scratchbuf);
const size_t outbuf_sz = sizeof(pbufs->outbuf);
@@ -2828,17 +2842,6 @@ bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
pmsg->seq = r.info->seq;
pmsg->dropped = r.info->seq - seq;

- /*
- * Check for dropped messages in panic here so that printk
- * suppression can occur as early as possible if necessary.
- */
- if (pmsg->dropped &&
- panic_in_progress() &&
- panic_console_dropped++ > 10) {
- suppress_panic_printk = 1;
- pr_warn_once("Too many dropped messages. Suppress messages on non-panic CPUs to prevent livelock.\n");
- }
-
/* Skip record that has level above the console loglevel. */
if (may_suppress && suppress_message_printing(r.info->level))
goto out;
--
2.39.2