Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup

From: Eugeniu Rosca
Date: Wed Mar 18 2020 - 14:05:48 EST


Hi Sergey,

Many thanks for your feedback!

On Tue, Mar 17, 2020 at 11:18:18AM +0900, Sergey Senozhatsky wrote:
> On (20/03/15 18:09), Eugeniu Rosca wrote:
>
> [..]
>
> > @@ -428,6 +428,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> > }
> > }
> >
> > + console_verbose_start();
> > +
> > pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
> > smp_processor_id(), duration,
> > current->comm, task_pid_nr(current));
> > @@ -453,6 +455,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> > if (softlockup_panic)
> > panic("softlockup: hung tasks");
> > __this_cpu_write(soft_watchdog_warn, true);
> > +
> > + console_verbose_end();
> > } else
> > __this_cpu_write(soft_watchdog_warn, false);
>
>
> I'm afraid, as of now, this approach is not going to work the way it's
> supposed to work in 100% of cases. Because the only thing that printk()
> call sort of guarantees is that the message will be stored somewhere.
> Either in the main kernel log buffer, on in one of auxiliary per-CPU
> log buffers. It does not guarantee, generally speaking, that the message
> will be printed on the console immediately.

I take this passage as an acknowledgement of the problem being _real_,
in spite of the fix being not perfect.

One aspect I would like to emphasize is that (please, NAK this
statement if it's not accurate) the problem reported in this patch is
not specific to the existing printk mechanism, but also applies to the
upcoming kthread-based printk. If that's true, then IMHO this is a
compelling argument to join forces and try to find a working, safe and
future-proof solution.

>
> Consider the following example:
>
> CPU0 CPU1
> console_lock();
> schedule();
>
> watchdog()
> console_verbose_start();
> printk()
> log_store()
> if (!console_trylock())
> return;
> console_verbose_end();
>
> ...
> console_unlock()
> print logbuf messages to the consoles
> we missed the console_verbose_start/end
> on CPU1

This looks plausible. However, I wonder to which degree the same
scenario is a concern in the kthread-based approach?

My current standpoint is that as long as points [A-D] are met, it
should do no harm to accept a (partial) fix like seen in my series:

- [A] the patch tackles at least a subset of problematic use-cases
- [B] the fix is non-intrusive and easy to review
- [C] there is hope to reuse it in the new lockless buffer based printk
- [D] there are no regressions employing the major console knobs
(ignore_loglevel, quiet, loglevel, etc) as it happened in
a6ae928c25835c ("Revert "printk: make sure to print log on console."")

>From the above points, my only major concern is that current series
breaks the expectations of users who pass loglevel=0 on kernel
command line and expect the system to be totally silent. This has
already been expressed in the cover letter. I would especially
appreciate if the same view is shared (or invalidated) by others.

>
> IIRC, we had a similar approach in the past. See commit 375899cddcbb26
> ("printk: make sure to print log on console"). And we reverted it, see
> a6ae928c25835 ("Revert "printk: make sure to print log on console.").

Thanks for this reference. It looks to me that in spite of being
relatively compact, commit 375899cddcbb26 ("printk: make sure to print
log on console.") broke criteria [D] listed above. I intend to avoid
it by testing multiple console knob values on my arm64 system.

Looking forward to your feedback on the questions posted above. TIA!

--
Best Regards
Eugeniu Rosca