Re: [RFC][PATCHv4 4/6] printk: report lost messages in printk safe/nmi contexts

From: Petr Mladek
Date: Fri Nov 25 2016 - 06:07:46 EST


On Fri 2016-10-28 00:49:31, Sergey Senozhatsky wrote:
> Account lost messages in pritk-safe and printk-safe-nmi
> contexts and report those numbers during printk_safe_flush().
>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
> ---
> kernel/printk/internal.h | 17 ---------------
> kernel/printk/printk.c | 10 ---------
> kernel/printk/printk_safe.c | 50 +++++++++++++++++++++++++++++++++++++++++++--
> 3 files changed, 48 insertions(+), 29 deletions(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 1878b2d..4675b8d 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1795,7 +1795,6 @@ asmlinkage int vprintk_emit(int facility, int level,
> unsigned long flags;
> int this_cpu;
> int printed_len = 0;
> - int nmi_message_lost;
> bool in_sched = false;
> /* cpu currently holding logbuf_lock in this function */
> static unsigned int logbuf_cpu = UINT_MAX;
> @@ -1846,15 +1845,6 @@ asmlinkage int vprintk_emit(int facility, int level,
> strlen(recursion_msg));
> }
>
> - nmi_message_lost = get_nmi_message_lost();
> - if (unlikely(nmi_message_lost)) {
> - text_len = scnprintf(textbuf, sizeof(textbuf),
> - "BAD LUCK: lost %d message(s) from NMI context!",
> - nmi_message_lost);
> - printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
> - NULL, 0, textbuf, text_len);
> - }

I really like that we are moving this out of vprintk_emit() but...

> -
> /*
> * The printf needs to come first; we need the syslog
> * prefix which might be passed-in as a parameter.
> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
> index af74d9c..721b091 100644
> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -52,10 +52,11 @@ struct printk_safe_seq_buf {
>
> static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq);
> static DEFINE_PER_CPU(int, printk_safe_context);
> +static atomic_t safe_message_lost;
>
> #ifdef CONFIG_PRINTK_NMI
> static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
> -atomic_t nmi_message_lost;
> +static atomic_t nmi_message_lost;
> #endif
>
> static int printk_safe_log_store(struct printk_safe_seq_buf *s,
> @@ -122,6 +123,41 @@ static void printk_safe_flush_seq_line(struct printk_safe_seq_buf *s,
> printk_safe_flush_line(buf, (end - start) + 1);
> }
>
> +static void report_message_lost(atomic_t *num_lost, char *fmt)
> +{
> + int lost = atomic_xchg(num_lost, 0);
> +
> + if (lost) {
> + char msg[56];

I would really like to avoid a hard coded buffer size. Such things
are likely to bite us in the future.

I thought about reshuffling a lot of logic, adding more wrappers,
... But the solution might be easy in the end, see below.

> +
> + scnprintf(msg, sizeof(msg), fmt, lost);
> +
> + printk_safe_flush_line(msg, strlen(msg));

This made my brain spin a lot. I wondered if it did what we wanted
and it was safe.

On one hand, it is supposed to work because use exactly this
function in __printk_safe_flush() where you call this from.

One question is if it does what we want in different contexts.
Let's look at it:

1. It calls printk_deferred() in NMI context. There is a risk
of a deadlock. But it is called only from
printk_safe_flush_on_panic() which is the last resort. Therefore
it does exactly what we want.

2. It calls printk()->printk_func()->vprintk_emit() in normal context.
It is what we want in normal context.

3. It calls printk()->printk_func()->v printk_safe() in printk_safe
context. This does not look correct. IMHO, this might happen
only printk_safe_flush_on_panic() and we want to use
printk_deferred() here as well.

Also note that printk_safe buffer might still be full at this
point. It means that the warning will not be stored. There also
might be a risk of an infinite loop. Both problems are
solved by printk_deferred().


The easiest solution would be to simply call printk_deferred()
here. Everything will be deferred after the async printk() patchset
anyway.

I would even use printk_deferred() in printk_safe_flush_line()
for each context. It is not optimal but it works very well
and it makes the code much more straightforward.


> + }
> +}
> +
> +#ifdef CONFIG_PRINTK_NMI
> +
> +static void report_nmi_message_lost(void)
> +{
> + report_message_lost(&nmi_message_lost,
> + "Lost %d message(s) from NMI context!");
> +}
> +
> +#else
> +
> +static void report_nmi_message_lost(void)
> +{
> +}
> +
> +#endif /* CONFIG_PRINTK_NMI */
> +
> +static void report_safe_message_lost(void)
> +{
> + report_message_lost(&safe_message_lost,
> + "Lost %d message(s) from printk-safe context!");
> +}
> +
> /*
> * Flush data from the associated per-CPU buffer. The function
> * can be called either via IRQ work or independently.
> @@ -147,6 +183,9 @@ static void __printk_safe_flush(struct irq_work *work)
>
> i = 0;
> more:
> + report_nmi_message_lost();
> + report_safe_message_lost();

Please, move this at the end of this function. There are few reasons
for this:

1. This causes an infinite loop if called in printk_safe() context
and vpritnk_safe() is used to print the warning.

2. This prints the warning about lost messages in the wrong order.
We should print the warning after we print the messages
that fit into the buffer.

Best Regards,
Petr