Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

From: Qian Cai
Date: Mon Oct 07 2019 - 11:33:31 EST


On Mon, 2019-10-07 at 17:12 +0200, Michal Hocko wrote:
> On Mon 07-10-19 10:59:10, Qian Cai wrote:
> [...]
> > It is almost impossible to eliminate all the indirect call chains from
> > console_sem/console_owner_lock to zone->lock because it is too normal that
> > something later needs to allocate some memory dynamically, so as long as it
> > directly call printk() with zone->lock held, it will be in trouble.
>
> Do you have any example where the console driver really _has_ to
> allocate. Because I have hard time to believe this is going to work at
> all as the atomic context doesn't allow to do any memory reclaim and
> such an allocation would be too easy to fail so the allocation cannot
> really rely on it.

I don't know how to explain to you clearly, but let me repeat again one last
time. There is no necessary for console driver directly to allocate considering
this example,

CPU0: CPU1: CPU2: CPU3:
console_sem->lock zone->lock
pi->lock
pi->lock rq_lock
rq->lock
zone->lock
console_sem->lock

Here it only need someone held the rq_lock and allocate some memory. There is
also true for port_lock. Since the deadlock could involve a lot of CPUs and a
longer lock chain, it is impossible to predict which one to allocate some memory
while held a lock could end up with the same problematic lock chain.

>
> So again, crippling the MM code just because of lockdep false possitives
> or a broken console driver sounds like a wrong way to approach the
> problem.
>
> > [ÂÂ297.425964] -> #1 (&port_lock_key){-.-.}:
> > [ÂÂ297.425967]ÂÂÂÂÂÂÂÂ__lock_acquire+0x5b3/0xb40
> > [ÂÂ297.425967]ÂÂÂÂÂÂÂÂlock_acquire+0x126/0x280
> > [ÂÂ297.425968]ÂÂÂÂÂÂÂÂ_raw_spin_lock_irqsave+0x3a/0x50
> > [ÂÂ297.425969]ÂÂÂÂÂÂÂÂserial8250_console_write+0x3e4/0x450
> > [ÂÂ297.425970]ÂÂÂÂÂÂÂÂuniv8250_console_write+0x4b/0x60
> > [ÂÂ297.425970]ÂÂÂÂÂÂÂÂconsole_unlock+0x501/0x750
> > [ÂÂ297.425971]ÂÂÂÂÂÂÂÂvprintk_emit+0x10d/0x340
> > [ÂÂ297.425972]ÂÂÂÂÂÂÂÂvprintk_default+0x1f/0x30
> > [ÂÂ297.425972]ÂÂÂÂÂÂÂÂvprintk_func+0x44/0xd4
> > [ÂÂ297.425973]ÂÂÂÂÂÂÂÂprintk+0x9f/0xc5
> > [ÂÂ297.425974]ÂÂÂÂÂÂÂÂregister_console+0x39c/0x520
> > [ÂÂ297.425975]ÂÂÂÂÂÂÂÂuniv8250_console_init+0x23/0x2d
> > [ÂÂ297.425975]ÂÂÂÂÂÂÂÂconsole_init+0x338/0x4cd
> > [ÂÂ297.425976]ÂÂÂÂÂÂÂÂstart_kernel+0x534/0x724
> > [ÂÂ297.425977]ÂÂÂÂÂÂÂÂx86_64_start_reservations+0x24/0x26
> > [ÂÂ297.425977]ÂÂÂÂÂÂÂÂx86_64_start_kernel+0xf4/0xfb
> > [ÂÂ297.425978]ÂÂÂÂÂÂÂÂsecondary_startup_64+0xb6/0xc0
>
> This is an early init code again so the lockdep sounds like a false
> possitive to me.

This is just a tip of iceberg to show the lock dependency,

console_owner --> port_lock_key

which could easily happen everywhere with a simple printk().