Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

From: Sergey Senozhatsky
Date: Fri Dec 15 2017 - 01:52:17 EST


On (12/15/17 14:06), Sergey Senozhatsky wrote:
[..]
> > Where do we do the above? And has this been proven to be an issue?
>
> um... hundreds of cases.
>
> deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> happening at the same moment + NMI backtraces from all the CPUs (more
> than 3 cpus) that follows the lockups, over not-so-fast serial console.
> exactly the bug report I received two days ago. so which one of the CPUs
> here is a good candidate to successfully emit all of the pending logbuf
> entries? none. all of them either have local IRQs disabled, or dump_stack()
> from either backtrace IPI or backtrace NMI (depending on the configuration).


and, Steven, one more thing. wondering what's your opinion.


suppose we have consoe_owner hand off enabled, 1 non-atomic CPU doing
printk-s and several atomic CPUs doing printk-s. Is proposed hand off
scheme really useful in this case? CPUs will now

a) print their lines (a potentially slow call_console_drivers())

and

b) spin in vprintk_emit on console_owner with local IRQs disabled
waiting for either non-atomic printk CPU or another atomic CPU
to finish printing its line (call_console_drivers()) and to hand
off printing. so current CPU, after busy-waiting for foreign CPU's
call_console_drivers(), will go and do his own call_console_drivers().
which, time-wise, simply doubles (roughly) the amount of time that
CPU spends in printk()->console_unlock(). agreed?

if we previously could have a case when non-atomic printk CPU would
grab the console_sem and print all atomic printk CPUs messages first,
and then its own messages, thus atomic printk CPUs would have just
log_store(), now we will have CPUs to call_console_driver() and to
spin on console_sem owner waiting for call_console_driver() on a foreign
CPU [not all of them: it's one CPU doing the print out and one CPU
spinning console_owner. but overall I think all CPUs will experience
that spin on console_sem waiting for call_console_driver() and then do
its own call_console_driver()].


even two CPUs case is not so simple anymore. see below.

- first, assume one CPU is atomic and one is non-atomic.
- second, assume that both CPUs are atomic CPUs, and go thought it again.


CPU0 CPU1

printk() printk()
log_store()
log_store()
console_unlock()
set console_owner
sees console_owner
sets console_waiter
spin
call_console_drivers()
sees console_waiter
break

printk()
log_store()
console_unlock()
set console_owner
sees console_owner
sets console_waiter
spin
call_console_drivers()
sees console_waiter
break

printk()
log_store()
console_unlock()
set console_owner
sees console_owner
sets console_waiter
spin
call_console_drivers()
sees console_waiter
break

printk()
log_store()
console_unlock()
set console_owner
sees console_owner
sets console_waiter
spin

....


that "wait for call_console_drivers() on another CPU and then do
its own call_console_drivers()" pattern does look dangerous. the
benefit of hand-off is really fragile sometimes, isn't it?

-ss