Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

From: Sergey Senozhatsky
Date: Mon Dec 18 2017 - 04:37:47 EST


On (12/15/17 10:08), Petr Mladek wrote:
[..]
> > > Is the above scenario really dangerous? console_lock() owner is
> > > able to sleep. Therefore there is no risk of a softlockup.
> > >
> > > Sure, many messages will get stacked in the meantime and the console
> > > owner my get then passed to another owner in atomic context. But
> > > do you really see this in the real life?
> >
> > console_sem is locked by atomic printk CPU1~CPU10. non-atomic CPU is just
> > sleeping waiting for the console_sem. while atomic printk CPUs just hand
> > off console_sem ownership to each other without ever up()-ing the console_sem.
> > what's the point of hand off here? how is that going to work?
> >
> > what we need to do is to offload printing from atomic contexts to a
> > non-atomic one - which is CPU0. and that non-atomic CPU is sleeping
> > on the console_sem, ready to continue printing. but it never gets its
> > chance to do so, because CPU0 ~ CPU10 just passing console_sem ownership
> > around, resulting in the same "we print from atomic context" thing.
>
> Yes, I understand the scenario. The question is how much it is
> realistic.

so it's unlikely to have several CPUs on an SMP system printk-ing
from atomic contexts, while one of the available CPUs does console_lock()
or printk() from non-atomic context?


[..]
> > which is not a justification. we are not looking for a solution that
> > does not make the things worse. we are looking for a solution that
> > does improve the things.
>
> But it does improve things! The question is if it is enough or not
> in the real life.

console_unlock() is still unbound.


"spreading the printk load" between N CPUs is just 50% of the actual
problem.

console_unlock() has several sides being involved: one is doing the
print out, the other one is sleeping on console_sem waiting for the
first one to up() the console_sem. yes, CPUs that do printk will now
hand over console_sem to each other, but up() still does not happen
for unknown period of time. so all of the processes that sleep in
TASK_UNINTERRUPTIBLE on console_sem - user space process, tty, pty,
drm, frame buffers, PM, etc. etc. - still have unbound TASK_UNINTERRUPTIBLE
sleep. we've been talking about it for years. but there are more
issues.


> Do you see a scenario where it makes things statistically or
> even deterministically worse?

I really don't have that much time right now; but I did a very quick
test on one of my boards today.

NOTE: speaking for my myself only and about my observations only.
you are free call it unrealistic, impossible, silly, etc.

One more NOTE:
The board I'm playing with has immediate printk offloading enabled.
We figured out it's better to have it enabled rather than not, after
all. It makes a huge difference.

And another NOTE:
I did NOT alter my setup; I did NOT perform the stupid printk-flood
type of test (while (1) printk()). I used the "tools" we are using here
every day, which don't really abuse printk.



// UPD... OK... I actually ended up spending several hours on it, much more
// than I planned. because... I was quite puzzled. I double... no... triple
// checked the backport. It's exactly what v4 posted by Steven does - modulo
// printk_safe stuff [I don't have it backported].


Back to the console_owner patch.

1) it opens both soft and hard lockup vectors

I see *a lot* of cases when CPU that call printk in a loop does not
end up flushing its messages. And the problem seems to be - preemption.


CPU0 CPU1

for_each_process_thread(g, p)
printk()
console_unlock() printk
console_trylock() fails
sets console_owner
sees console_owner
sets console_waiter
call_console_drivers
clears console_owner
sees console_waiter
hand off spins with local_irq disabled
sees that it has acquired console_sem ownership

enables local_irq
printk
.. << preemption >>
printk
... unbound number of printk-s
printk
..
printk
back to TASK_RUNNING
goes to console_unlock()
printk
local_irq_save

???
*** if it will see console_waiter [being in any
context] it will hand off. otherwise, preemption
again and CPU0 can add even more messages to logbuf

local_irq_restore

<< preemption >>



I added several traces points, trying to understand why this patch
was so unimpressive on my board.

console_trylock() was successful
- trace_offloading("vprintk_emit()->trylock OK", " ", 0);

console_trylock() was unsuccessful and we went through `if (!waiter && owner && owner != current)'
under console_owner_lock spin_lock with local_irqs disabled
- trace_offloading("vprintk_emit()->trylock FAIL", " will spin? ", spin);

task in console_unlock() set console_owner, under console_owner_lock
- trace_offloading("set console_owner", " ", 0);

task in console_unlock() cleared console_owner, under console_owner_lock
- trace_offloading("clear console_owner", " waiter != NULL ", !!waiter);

task in console_unlock() saw waiter, break out of printing loop and hand
off printing
- trace_offloading("hand off", " ", 0);



a *very small* part of the logs. a quick introduction [I changed the
name of processes, so it'll be easier]:

i_do_printks-2997 is the process that does printk() loop. the logs starts
with it being in console_unlock(), just when it set the console_owner.
usertaskA-1167 comes in, fails to trylock, sees the console_owner, sets
itself as console_waiter. kworker/0:0-2990 comes in, fails to trylock
console_sem, fails to set itself as console_waiter, leaves vprintk_emit().
i_do_printks-2997 sees console_waiter, hand off printing to usertaskA-1167.
usertaskA-1167 is getting preempted. i_do_printks-2997 continue printk-s,
which are just log_store(), because console_sem is locked, its owner is
preempted.


...
i_do_printks-2997 [003] d..1 792.616378: offloading: set console_owner :0
usertaskA-1167 [001] d..1 792.617560: offloading: vprintk_emit()->trylock FAIL will spin? :1
kworker/0:0-2990 [000] d..1 792.618280: offloading: vprintk_emit()->trylock FAIL will spin? :0
kworker/0:0-2990 [000] d..1 792.618387: offloading: vprintk_emit()->trylock FAIL will spin? :0
kworker/0:0-2990 [000] d..1 792.618470: offloading: vprintk_emit()->trylock FAIL will spin? :0
kworker/0:0-2990 [000] d..1 792.618478: offloading: vprintk_emit()->trylock FAIL will spin? :0
kworker/0:0-2990 [000] d..1 792.618723: offloading: vprintk_emit()->trylock FAIL will spin? :0
kworker/0:0-2990 [000] d..1 792.618902: offloading: vprintk_emit()->trylock FAIL will spin? :0
kworker/0:0-2990 [000] d..1 792.619057: offloading: vprintk_emit()->trylock FAIL will spin? :0
kworker/0:0-2990 [000] d..1 792.619064: offloading: vprintk_emit()->trylock FAIL will spin? :0
kworker/0:0-2990 [000] d..1 792.620077: offloading: vprintk_emit()->trylock FAIL will spin? :0
kworker/0:0-2990 [000] d..1 792.620323: offloading: vprintk_emit()->trylock FAIL will spin? :0
kworker/0:0-2990 [000] d..1 792.620444: offloading: vprintk_emit()->trylock FAIL will spin? :0
kworker/0:0-2990 [000] d..1 792.620927: offloading: vprintk_emit()->trylock FAIL will spin? :0
kworker/0:0-2990 [000] d..1 792.620958: offloading: vprintk_emit()->trylock FAIL will spin? :0
i_do_printks-2997 [003] d..1 792.629275: offloading: clear console_owner waiter != NULL :1
i_do_printks-2997 [003] d..1 792.629280: offloading: hand off :0
usertaskA-1167 [001] d..1 792.629281: offloading: new waiter acquired ownership:0
i_do_printks-2997 [003] d..1 792.630639: offloading: vprintk_emit()->trylock FAIL will spin? :0
i_do_printks-2997 [003] d..1 792.630663: offloading: vprintk_emit()->trylock FAIL will spin? :0
...

<< CUT. printks, printks, printks. boring stuff. just many
printks. usertaskA-1167 is preempted, holding the console_sem,
so i_do_printks-2997 just log_store >>

...
i_do_printks-2997 [003] d..1 792.645663: offloading: vprintk_emit()->trylock FAIL will spin? :0
i_do_printks-2997 [003] d..1 792.645670: offloading: vprintk_emit()->trylock FAIL will spin? :0
usertaskA-1167 [001] d..1 792.654176: offloading: set console_owner :0
systemd-udevd-672 [003] d..2 792.660132: offloading: vprintk_emit()->trylock FAIL will spin? :1
usertaskA-1167 [001] d..1 792.661759: offloading: clear console_owner waiter != NULL :1
usertaskA-1167 [001] d..1 792.661761: offloading: hand off :0
systemd-udevd-672 [003] d..2 792.661761: offloading: new waiter acquired ownership:0
systemd-udevd-672 [003] d..2 792.661772: offloading: set console_owner :0
systemd-udevd-672 [003] d..2 792.675610: offloading: clear console_owner waiter != NULL :0
systemd-udevd-672 [003] d..2 792.675628: offloading: set console_owner :0
systemd-udevd-672 [003] d..2 792.689281: offloading: clear console_owner waiter != NULL :0
...

<< CUT. systemd now actually prints the messages that were added by
i_do_printks-2997. usertaskA-1167 was preempted all that time and
printed just ONE message before it hand printing duty over to
systemd-udevd which had to print lots of pending logbuf messages. >>

<< then another user-space process was lucky enough to set itself
as console_waiter, it got console_sem ownership from systemd-udevd-672
and spent in console_unlock() enough time to got killed by user
watchdog >>


can do more tests a bit later.
I really see lots of problems even on trivial tests [i_do_printks-2997
is preemptible].


another test. which proves preemption and rescheduling. we start with a
ping-pong between i_do_printks-3616 and usertaskA-370. but soon usertaskA-370
acquires the lock and gets preempted.


i_do_printks-3616 [001] d..1 1923.711382: offloading: set console_owner :0
i_do_printks-3616 [001] d..1 1923.724524: offloading: clear console_owner waiter != NULL :0
i_do_printks-3616 [001] d..1 1923.724685: offloading: set console_owner :0
usertaskA-370 [000] d..1 1923.734280: offloading: vprintk_emit()->trylock FAIL will spin? :1
i_do_printks-3616 [001] d..1 1923.737847: offloading: clear console_owner waiter != NULL :1
usertaskA-370 [000] d..1 1923.737850: offloading: new waiter acquired ownership:0
i_do_printks-3616 [001] d..1 1923.737850: offloading: hand off :0
usertaskA-370 [000] d..1 1923.737862: offloading: set console_owner :0
i_do_printks-3616 [001] d..1 1923.737934: offloading: vprintk_emit()->trylock FAIL will spin? :1
usertaskA-370 [000] d..1 1923.751870: offloading: clear console_owner waiter != NULL :1
usertaskA-370 [000] d..1 1923.751875: offloading: hand off :0
i_do_printks-3616 [001] d..1 1923.751875: offloading: new waiter acquired ownership:0
i_do_printks-3616 [001] d..1 1923.751938: offloading: set console_owner :0
usertaskA-370 [003] d..1 1923.762570: offloading: vprintk_emit()->trylock FAIL will spin? :1
i_do_printks-3616 [001] d..1 1923.765193: offloading: clear console_owner waiter != NULL :1
i_do_printks-3616 [001] d..1 1923.765196: offloading: hand off :0
usertaskA-370 [003] d..1 1923.765196: offloading: new waiter acquired ownership:0
i_do_printks-3616 [002] d..1 1923.766057: offloading: vprintk_emit()->trylock FAIL will spin? :0
i_do_printks-3616 [002] d..1 1923.766073: offloading: vprintk_emit()->trylock FAIL will spin? :0
...

<< CUT. printks, printks, printks. usertaskA-370 is preempted, i_do_printks-3616
just does log_store(). Notice that the new console_sem owner acquired the ownership
on CPU3, while it started to print pending logbuf messages later, being first scheduled
on CPU2 and then rescheduled on CPU1. >>

...
i_do_printks-3616 [002] d..1 1923.767162: offloading: vprintk_emit()->trylock FAIL will spin? :0
i_do_printks-3616 [002] d..1 1923.767166: offloading: vprintk_emit()->trylock FAIL will spin? :0
usertaskA-370 [002] d..1 1923.769225: offloading: set console_owner :0
usertaskA-370 [002] d..1 1923.779213: offloading: clear console_owner waiter != NULL :0
usertaskA-370 [001] d..1 1923.782622: offloading: set console_owner :0
usertaskA-370 [001] d..1 1923.793238: offloading: clear console_owner waiter != NULL :0
usertaskA-370 [001] d..1 1923.794991: offloading: set console_owner :0
usertaskA-370 [001] d..1 1923.807259: offloading: clear console_owner waiter != NULL :0
usertaskA-370 [001] d..1 1923.807403: offloading: set console_owner :0
usertaskA-370 [001] d..1 1923.821280: offloading: clear console_owner waiter != NULL :0
...

<< usertaskA-370 prints all the pending messages. either until logbuf
becomes empty, or another task calls printk [possible from atomic]. >>


well, clearly, nothing prevents usertaskA-370 from sleeping really long
on a busy system. so the logbuf can contain a lot of messages to print
out.

and so on. I guess you got the picture by now.



now,
It is absolutely possible that we can have preempted console_sem owner,
(somewhere in vprintk_emit right after acquiring the console_sem ownership,
or somewhere in console_unlock(). before the printing loop, or after
re-enabling local IRQs withing the printing loop);
while other CPU adds messages to the logbuf (printk()-s) and then what
we need it just printk() from IRQ or atomic context to take over printing
from the current console_sem owner and to flush all the pending messages
from atomic context, heading towards the lockup.


2) an unexpected test...

the scenario which I mentioned last week.

a) spin lock lockup on several CPUs (not on all CPUs)
b) simultaneous dump_stack() from locked up CPUs
c) trigger_all_cpu_backtrace()
d) immediate WANR_ON from networking irq on one of the lockedup
CPU (cpus eventually were able to acquire-release the spin_lock
and enable local IRQs).

last week the board had immediate printk offloading enabled when the
lockup has happened, so all those printk-s from irqs/etc were as fast
as
lock(log_buf); sprintf(); memcpy(); unlock(log_buf);

there was printk_kthread to print it.

the issue all of a sudden has reproduced today. twice. this time
I had immediate printk disabled and console_owner hand off enabled.
unfortunately, the board didn't survive the lockup this time. _it looks_
like printk offloading was really crucial there.



> Why did you completely ignored the paragraph about step by step
> approach? Is there anything wrong about it?

frankly, I don't even understand what our plan is. I don't see how are
we going to verify the patch. over the last 3 years, how many emails
you have from facebook or samsung or linaro, or any other company
reporting the printk-related lockups? I don't have that many in my gmail
inbox, to be honest. and this is not because there were no printk lockups
observed. this is because people usually don't report those issues to
the upstream community. especially vendors that use "outdated" LTS
kernels, which are approx 1-2 year(s) behind the mainline. and I don't see
why it should be different this time. it will take years before vendors
pick the next LTS kernel, which will have that patch in it. but the really
big problem here is that we already know that the patch has some problems.
are we going to conclude that "no emails === no problems"? with all my
respect, it does seem like, in the grand scheme of things, we are going
to do the same thing, yet expecting different result. that's my worry.

may be I'm wrong.



> You are looking for a perfect solution.

gentlemen, once again, you really can name it whatever you like.
the bottom line is [speaking for myself only]:

- the patch does not fit my needs.
- it does not address the issues I'm dealing with.
- it has a significantly worse behaviour compared to old async printk.
- it keeps sleeping on console_sem tasks in TASK_UNINTERRUPTIBLE
for a long time.
- it timeouts user space apps.
- it re-introduces the lockup vector, by passing console_sem ownership
to atomic tasks.
- it doubles the amount of time CPU spins with local IRQs disabled in
the worst case.
- I probably need to run more tests [I didn't execute any OOM tests, etc.],
but, preliminary, I can't see Samsung using this patch.

sorry to tell you that.
maybe it's good enough for the rest of the world, but...

-ss