Re: [PATCH] mm: don't warn about allocations which stall for too long

From: Petr Mladek
Date: Thu Nov 02 2017 - 07:46:57 EST


On Wed 2017-11-01 11:36:47, Steven Rostedt wrote:
> On Wed, 1 Nov 2017 14:38:45 +0100
> Petr Mladek <pmladek@xxxxxxxx> wrote:
> > My current main worry with Steven's approach is a risk of deadlocks
> > that Jan Kara saw when he played with similar solution.
>
> And if there exists such a deadlock, then the deadlock exists today.

The patch is going to effectively change console_trylock() to
console_lock() and this might add problems.

The most simple example is:

console_lock()
printk()
console_trylock() was SAFE.

console_lock()
printk()
console_lock() cause DEADLOCK!

Sure, we could detect this and avoid waiting when
console_owner == current. But does this cover all
situations? What about?

CPU0 CPU1

console_lock() func()
console->write() take_lockA()
func() printk()
busy wait for console_lock()

take_lockA()

By other words, it used to be safe to call printk() from
console->write() functions because printk() used console_trylock().
Your patch is going to change this. It is even worse because
you probably will not use console_lock() directly and therefore
this might be hidden for lockdep.

BTW: I am still not sure how to make the busy waiter preferred
over console_lock() callers. I mean that the busy waiter has
to get console_sem even if there are some tasks in the workqueue.


> > But let's wait for the patch. It might look and work nicely
> > in the end.
>
> Oh, I need to write a patch? Bah, I guess I should. Where's all those
> developers dying to do kernel programing where I can pass this off to?

Yes, where are these days when my primary task was to learn kernel
hacking? This would have been a great training material.

I still have to invest time into fixing printk. But I personally
think that the lazy offloading to kthreads is more promising
way to go. It is pretty straightforward. The only problem is
the guaranty of the takeover. But there must be a reasonable
way how to detect that the system heart is still beating
and we are not the only working CPU.

Best Regards,
Petr