Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

From: Sergey Senozhatsky
Date: Thu Dec 14 2017 - 21:10:41 EST


Hello,

On (12/14/17 10:11), Tejun Heo wrote:
> Hey, Steven.
>
> On Thu, Dec 14, 2017 at 12:55:06PM -0500, Steven Rostedt wrote:
> > Yes! Please create a reproducer, because I still don't believe there is
> > one. And it's all hand waving until there's an actual report that we can
> > lock up the system with my approach.
>
> Yeah, will do, but out of curiosity, Sergey and I already described
> what the root problem was and you didn't really seem to take that. Is
> that because the explanation didn't make sense to you or us
> misunderstanding what your code does?

I second _everything_ that Tejun has said.


Steven, your approach works ONLY when we have the following preconditions:

a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
etc) context

what does guarantee that? what happens if there is NO non-atomic
CPU or that non-atomic simplky missses the console_owner != false
point? we are going to conclude

"if printk() doesn't work for you, it's because you are holding it wrong"?


what if that non-atomic CPU does not call printk(), but instead
it does console_lock()/console_unlock()? why there is no handoff?

CPU0 CPU1 ~ CPU10
in atomic contexts [!]. ping-ponging console_sem
ownership to each other. while what they really
need to do is to simply up() and let CPU0 to
handle it.
printk
console_lock()
schedule()
...
printk
printk
...
printk
printk

up()

// woken up
console_unlock()

why do we make an emphasis on fixing vprintk_printk()?


b) non-atomic CPU sees console_owner set (which is set for a very short
period of time)

again. what if that non-atomic CPU does not see console_owner?
"don't use printk()"?

c) the task that is looping in console_unlock() sees non-atomic CPU when
console_owner is set.


IOW, we need to have


the right CPU (a) at the very right moment (b && c) doing the very right thing.


* and the "very right moment" is tiny and additionally depends
on a foreign CPU [the one that is looping in console_unlock()].



a simple question - how is that going to work for everyone? are we
"fixing" a small fraction of possible use-cases?



Steven, I thought we reached the agreement [**] that the solution we should
be working on is a combination of prinkt_kthread and console_sem hand
off. Simply because it adds the missing "there is a non-atomic CPU wishing
to console_unlock()" thing.

lkml.kernel.org/r/20171108162813.GA983427@xxxxxxxxxxxxxxxxxxxxxxxxxxx

https://marc.info/?l=linux-kernel&m=151011840830776&w=2
https://marc.info/?l=linux-kernel&m=151015141407368&w=2
https://marc.info/?l=linux-kernel&m=151018900919386&w=2
https://marc.info/?l=linux-kernel&m=151019815721161&w=2
https://marc.info/?l=linux-kernel&m=151020275921953&w=2
** https://marc.info/?l=linux-kernel&m=151020404622181&w=2
** https://marc.info/?l=linux-kernel&m=151020565222469&w=2


what am I missing?

-ss