Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

From: Steven Rostedt
Date: Wed Jan 10 2018 - 13:05:26 EST


On Wed, 10 Jan 2018 06:05:47 -0800
Tejun Heo <tj@xxxxxxxxxx> wrote:

> On Wed, Jan 10, 2018 at 02:24:16PM +0100, Petr Mladek wrote:
> > This is the last version of Steven's console owner/waiter logic.
> > Plus my proposal to hide it into 3 helper functions. It is supposed
> > to keep the code maintenable.
> >
> > The handshake really works. It happens about 10-times even during
> > boot of a simple system in qemu with a fast console here. It is
> > definitely able to avoid some softlockups. Let's see if it is
> > enough in practice.
> >
> > From my point of view, it is ready to go into linux-next so that
> > it can get some more test coverage.
> >
> > Steven's patch is the v4, see
> > https://lkml.kernel.org/r/20171108102723.602216b1@xxxxxxxxxxxxxxxxxx
>
> At least for now,
>
> Nacked-by: Tejun Heo <tj@xxxxxxxxxx>

And I NACK your NACK!

>
> Maybe this can be a part of solution but it's really worrying how the
> whole discussion around this subject is proceeding. You guys are
> trying to railroad actual problems. Please address actual technical
> problems.

WE ARE!

I presented the issue at Kernel Summit and everyone agreed with me that
the issue my patch solves is a real issue. You have yet to demonstrate
how this does not solve issues.

I presented the history of printk, where it use to serialize all
printks. This was a problem when you had n CPUs doing printks at the
same time, because the n'th CPU had to wait for the n-1 CPUs to print
before it could. This was obviously an issue.

The "solution" to that was to have the first printk do the printing,
and all other printks that come in while it is printing just load their
data into the log buffer and continue. The first printk would get stuck
printing for everyone else. This was fine when we had 4 CPUs, but now
that we have boxes with 100s of CPUs, this is definitely an issue. I
demonstrated that this caused printk() to be unbounded, and there were
real word scenarios that could easily cause a printk to never stop
printing.

My solution is to make printk() have a max bounded time to print. This
is how we solve things in the Real Time world, and it makes perfect
sense in this context. The point being, the max a printk() could
print, and that is if it was really unlucky, which would be really
unlikely because it would mean we had a burst of printks followed by no
printks, the bounded time is what it takes to print the entire buffer.

My solution takes printk from its current unbounded state, and makes it
fixed bounded. Which means printk() is now a O(1) algorithm.

The solution is simple, everyone at KS agreed with it, there should be
no controversy here.

You on the other hand are showing unrealistic scenarios, and crying
that it's what you see in production, with no proof of it.

My printk solution is solid, with no risk of regressions of current
printk usages.

If anything, I'll pull theses patches myself, and push them to Linus
directly. I'll Cc you and you can make your argument to NACK them, and
I'll make mine to take them.

-- Steve