Re: [PATCH] printk: Userspace format enumeration support

From: Chris Down
Date: Fri Feb 05 2021 - 22:11:42 EST


Hi Petr,

Thanks for looking over the patch. :-)

Petr Mladek writes:
Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.

As you'd expect, monitoring using printk is rather brittle for a number
of reasons -- most notably that the message might disappear entirely in
a new version of the kernel, or that the message may change in some way
that the regex or other classification methods start to silently fail.

Another is that printk() is not reliable on its own. Messages might
get lost. The size of the log buffer is limited. Deamon reading
/dev/kmsg need not be scheduled in time or often enough. Console
might be slow. The messages are filtered on the console by console_loglevel.

This is of course true. We don't use kmsg as the last line of defence for monitoring or remediation, of course, but it would be unwise to not have infrastructure capable of monitoring it. We often need to act quickly when production incidents happen, and often kmsg is the place where those "unexpected" issues are surfaced. It's often much more likely that there is some kmsg log which we can act on in those scenarios than anything else, and even if it's not ideal, in reality, it's typically reliable enough to at least mitigate the problem when dealing with a large fleet of machines :-)

# Format: <module>,<facility><level><format>\0
$ perl -p -e 's/\n/\\n/g;s/\0/\n/g' /proc/printk_formats | shuf -n 5
vmlinux,6Disabling APIC timer\n
intel_rapl_common,3intel_rapl_common: Cannot find matching power limit for constraint %d\n
dm_crypt,3device-mapper: crypt: %s: INTEGRITY AEAD ERROR, sector %llu\n
mac80211,6%s: AP bug: HT capability missing from AssocResp\n
vmlinux,3zpool: couldn't create zpool - out of memory\n

The facility and log level are not well separated from the format string.

Also this is yet another style how the format is displayed. We already have

+ console/syslog: formated by record_print_text()
+ /dev/kmsg: formatted by info_print_ext_header(), msg_print_ext_body().
+ /sys/kernel/debug/dynamic_debug/control
+ /sys/kernel/debug/tracing/printk_formats

We should get some inspiration from the existing interfaces.

Sure, I'm not super bound to the format, as long as we have something that can aid those maintaining these systems which monitor printk in identifying that a format was mutated or removed. The module is more or less optional -- it's just intended as a hint about where to look.

But we first should decide what information might be useful:

+ 'facility' should not be needed. All messages should be from
kernel.

That's fair enough, it can be omitted. I just didn't want to stray too far from the netconsole format, since we already mostly have it in this format there.

My intention is to _not_ deviate from existing interfaces, really, so I'll be happy with any suggested format that will achieve this patch's stated goals, since this kind of data is sorely needed :-)

+ <module> is already optinaly added by pr_fmt() to the printed strings
as: pr_fmt(): ...

pr_fmts are not consistently used across the kernel, and sometimes differ from the module itself. Many modules don't use it at all, and we also don't have it for pr_cont. Just picking some random examples:

% grep -av vmlinux /proc/printk_formats | shuf -n 10
mac80211,6%s: mesh STA %pM switches to channel requiring DFS (%d MHz, width:%d, CF1/2: %d/%d MHz), aborting
thinkpad_acpi,c N/Athinkpad_acpi,c %dthinkpad_acpi,5thinkpad_acpi: temperatures (Celsius):thinkpad_acpi,3thinkpad_acpi: Out of memory for LED data
i915,6drm/i915 developers can then reassign to the right component if it's not a kernel issue.
video,4[Firmware Bug]: _BCQ is used instead of _BQC
i915,3gvt: requesting SMI service
are MMIO SPTEs.
i915,3gvt: invalid tiling mode: %x
video,3ACPI: Create sysfs link
cec,6cec-%s: duplicate logical address type
soundwire_bus,3%s: %s: inconsistent state state %d

+static int proc_pf_show(struct seq_file *s, void *v)
+{
+ const struct printk_fmt_sec *ps = NULL;
+ const char **fptr = NULL;
+
+ mutex_lock(&printk_fmts_mutex);
+
+ list_for_each_entry(ps, &printk_fmts_list, list) {
+ const char *mod_name = ps_get_module_name(ps);
+
+ for (fptr = ps->start; fptr < ps->end; fptr++) {
+ seq_puts(s, mod_name);
+ seq_putc(s, ',');
+ seq_puts(s, *fptr);
+ seq_putc(s, '\0');
+ }

You probably should get inspiration from t_show() in trace_printk.c.
It handles newlines, ...

Or by ddebug_proc_show(). It uses seq_escape().

Anyway, there is something wrong at the moment. The output looks fine
with cat. But "less" says that it is a binary format and the output
is a bit messy:

Hmm, why should that be a problem? It's intentional that this pretty much just directly replicates the format string passed to printk, since it's easy to write a parser for it:

1. Go up to the comma, take the module
2. Take the facility and level
3. Take the rest up to a \0 as the format
4. Go to 1

I don't mind to have it escaped, but I'm not immediately seeing the benefit. We also don't escape `\0` in (for example) `/proc/pid/cmdline`, since it serves as a good natural delimiter.

Thanks for taking the time to review :-)

Chris