Re: [PATCH 2/2] printk: always report lost messages on serial console

From: Sergey Senozhatsky
Date: Tue Jan 03 2017 - 21:47:28 EST


On (01/03/17 17:53), Petr Mladek wrote:
> On Wed 2017-01-04 00:47:45, Sergey Senozhatsky wrote:
> > On (01/03/17 15:55), Petr Mladek wrote:
> > [..]
> > > This causes the opposite problem. We might print a message that was supposed
> > > to be suppressed.
> >
> > so what? yes, we print a message that otherwise would have been suppressed.
> > not a big deal. at all. we are under high printk load and the best thing
> > we can do is to report "we are losing the messages" straight ahead. the
> > next 'visible' message may be seconds/minutes/forever away. think of a
> > printk() flood of messages with suppressed loglevel coming from CPUA-CPUX,
> > big enough to drain all 'visible' loglevel messages from CPUZ. we are
> > back to problem "a".
> >
> > thus I want a simple bool flag and a simple rule: we see something - we say it.
>
> So, you prefer to print some random debug message instead of an
> emergency one?

yep. because console_unlock() is not the right place to address printk()
abuse, and console_unlock() cannot address it. the only way to fix it is
to reduce the amount of printk() calls (well, there is one more thing
probably and may be but not really *).


> It will always drop a message because you always process only one
> and many new appear in the meantime. While with my solution,
> you should see:
>
> ** 1324 printk messages dropped ** <alert: random message>
> ** 523 printk messages dropped ** <emerg: random message>
> ** 324 printk messages dropped ** <emerg: random message>
> ** 345 printk messages dropped ** <alert: random message>

once you started losing the messages because of printk() flood you will
lose them no matter what you do. console_unlock() cannot force printk() CPUs
to add less messages to the logbuf, so we magically can have enough time to
print the message on the consoles. any call to console drivers leads to lost
messages. there is no difference.

this is from the real serial logs I'm looking at right now. we attach
"bad news" to 'critical' messages only:

...
[ 32.941061] bc00: b65dc0d8 b65dc6d0 ae1fbc7c b65c11c5 b563c9c4 00000001 b65dc6d0 b0a62f74
** 150 printk messages dropped ** [ 32.941614] cee0: 00000081 ae1fcef0 00000038 00000000 b5369000 ae1fcf00 ae1fd0f0 00000000
** 75 printk messages dropped ** [ 32.941892] d860: 00056608 af203848 00000000 0004088c 000000d0 00000000 00000000 00000000
** 12 printk messages dropped ** [ 32.941940] ..
** 2 printk messages dropped ** [ 32.941951] ..
** 10 printk messages dropped ** [ 32.941992] ..
** 1 printk messages dropped ** [ 32.941999] ..
...

note how many critical messages I lost in consecutive console drivers calls
because console_unlock() was printing other critical messages.
no matter what we filter-out in console_unlock() we are still far-far-far
behind the CPUs that flood the printk buffer. and those CPUs will be happy
to drain any critical/visible message from the logbuf.




* may be can do something like this:

two logbuf buffers. one for `everything' -- both suppressed and visible
loglevels. this one is also used by dmesg. the other one if for messages
that won't be suppressed. we call_console_drivers() on that buffer.
so vprintk_emit() becomes

vprintk_emit()
{
spin_lock logbuf_lock

text = sprintf(...)

log_store(logbuf)
if (!suppress_message_printing(level))
log_store(printing_lofbuf)

spin_ulock logbuf_lock
}

and console_unlock() reads printing_logbuf only. but we still can lose
messages even from filtered printing_logbuf.

-ss