Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

From: Steven Rostedt
Date: Thu Jan 11 2018 - 11:29:17 EST


On Thu, 11 Jan 2018 19:38:45 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@xxxxxxxxx> wrote:

>
> the non-atomic -> atomic context console_sem transfer. we previously
> would have kept the console_sem owner to its non-atomic owner. we now
> will make sure that if printk from atomic context happens then it will
> make it to console_unlock() loop.
> emphasis on O(logbuf) > watchdog_thresh.
>
>
> - if the patch's goal is to bound (not necessarily to watchdog's threshold)
> the amount of time we spend in console_unlock(), then the patch is kinda
> overcomplicated. but no further questions in this case.

It's goal is to keep printk from running amok on a single CPU like it
currently does. This prevents one printk from never ending. And it is
far from complex. It doesn't deal with "offloading". The "handover" is
only done to those that are doing printks. What do you do if all CPUs
are in "critical sections", how would a "handoff to safe" work? Will
the printks never get out? If the machine were to triple fault and
reboot, we lost all of it.

>
> - but if the patch's goal is to bound (to lockup threshold) the amount of
> time spent in console_unlock() in order to avoid lockups [uh, a reason],
> then the patch is rather oversimplified.

It's bound to print all the information that has been added to the
printk buffer. You want to bound it to some "time" and what about the
printks that haven't gotten out yet? Delay them to something else, and
if the machine were to crash in the transfer, we lost all that data.

My method, there's really no delay between a hand off. There's always
an active CPU doing printing. It matches the current method which works
well for getting information out. A delayed approach will break that
and that's what people like myself, Peter, Linus and others are worried
about.


>
>
> claiming that for any given A, B, C the following is always true
>
> A * B < C
>
> where
> A is the amount of data to print in the worst case
> B the time call_console_drivers() needs to print a single
> char to all registered and enabled consoles
> C the watchdog's threshold
>
> is not really a step forward.

It's no different than what we have, except that we currently have A
being infinite. My patch makes A no longer infinite, but a constant.
Yes that constant is mutable, but it's still a constant, and
controlled by the user. That to me is definitely a BIG step forward.

>
> and the "last console_sem owner prints all pending messages" rule
> is still there.
>
>
> > Or do you have a system that started to suffer from softlockups
> > with this patchset and did not do this before?
> [..]
> > Do you know about any system where this patch made the softlockup
> > deterministically or statistically more likely, please?
>
> I have explained many, many times why my boards die just like before.
> why would I bother collecting any numbers...

Great, and there's cases that die that my patch solves. Lets add my
patch now since it is orthogonal to an offloading approach and see how
it works, because it would solve issues that I have hit. If you can
show that this isn't good enough we can add another approach. We are
solving two different problems. My patch simply makes one printk() no
longer unbounded. It's a fixed time.

Honestly, I don't see why you are against this patch. It doesn't stop
your work. If this patch isn't enough (but it does fix some issues),
then we can look at adding other approaches. Really, it sounds like you
are afraid of this patch, that it might be good enough for most cases
which would make adding another approach even more difficult.

-- Steve