Re: [PATCH] mm/page_isolation: fix a deadlock with printk()

From: Andrew Morton
Date: Sat Oct 05 2019 - 20:44:36 EST


On Sat, 5 Oct 2019 20:10:47 -0400 Qian Cai <cai@xxxxxx> wrote:

>
>
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #2 (&(&zone->lock)->rlock){-.-.}:
> >> lock_acquire+0x21a/0x468
> >> _raw_spin_lock+0x54/0x68
> >> get_page_from_freelist+0x8b6/0x2d28
> >> __alloc_pages_nodemask+0x246/0x658
> >> __get_free_pages+0x34/0x78
> >> sclp_init+0x106/0x690
> >> sclp_register+0x2e/0x248
> >> sclp_rw_init+0x4a/0x70
> >> sclp_console_init+0x4a/0x1b8
> >> console_init+0x2c8/0x410
> >> start_kernel+0x530/0x6a0
> >> startup_continue+0x70/0xd0
> >
> > This appears to be the core of our problem?
>
> No, that is just one of those many places could form the lock chain.
>
> console_lock -> other locks -> zone_lock
>
> Another example is,
>
> https://lore.kernel.org/lkml/1568823006.5576.178.camel@xxxxxx/

There is no "console_lock". Please be much more specific.

> It is easier to avoid,
>
> zone_lock -> console_lock
>
> rather than fixing the opposite.

"ease" isn't the main objective. A more important question is "what
makes sense". We should be able to call printk() from anywhere, any
time under any conditions. That can't be done 100% but it is the
objective. printk() should be robust and not being able to call
printk() while holding zone->lock isn't robust!

btw, this:

: It is unsafe to call printk() while zone->lock was held, i.e.,
:
: zone->lock --> console_sem

doesn't make a lot of sense. console_sem is a sleeping lock so
attempting to acquire it (with down()!) under spinlock is a huge bug.
Again, please be careful with the descriptions.