Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality

From: Sergey Senozhatsky
Date: Thu Apr 30 2015 - 02:34:47 EST


Hello Minchan,

On (04/30/15 14:47), Minchan Kim wrote:
[..]
>
> Isn't it related to bd_mutex?

I think it is:

[ 216.713922] Possible unsafe locking scenario:
[ 216.713923] CPU0 CPU1
[ 216.713924] ---- ----
[ 216.713925] lock(&bdev->bd_mutex);
[ 216.713927] lock(s_active#162);
[ 216.713929] lock(&bdev->bd_mutex);
[ 216.713930] lock(s_active#162);
[ 216.713932]
*** DEADLOCK ***

> I think the problem of deadlock is that you are trying to remove sysfs file
> in sysfs handler.
>
> #> echo 1 > /sys/xxx/zram_remove
>
> kernfs_fop_write - hold s_active
> -> zram_remove_store
> -> zram_remove
> -> sysfs_remove_group - hold s_active *again*
>
> Right?
>

are those same s_active locks?


we hold (s_active#163) and (&bdev->bd_mutex) and want to acquire (s_active#162)

[ 216.713934] 5 locks held by bash/342:
[ 216.713935] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff811508a1>] vfs_write+0xaf/0x145
[ 216.713938] #1: (&of->mutex){+.+.+.}, at: [<ffffffff811af1d3>] kernfs_fop_write+0x9c/0x14c
[ 216.713942] #2: (s_active#163){.+.+.+}, at: [<ffffffff811af1dc>] kernfs_fop_write+0xa5/0x14c
[ 216.713946] #3: (zram_index_mutex){+.+.+.}, at: [<ffffffffa022276f>] zram_remove_store+0x45/0xba [zram]
[ 216.713950] #4: (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]


full log:

[ 216.713826] ======================================================
[ 216.713827] [ INFO: possible circular locking dependency detected ]
[ 216.713829] 4.1.0-rc1-next-20150430-dbg-00010-ga86accf-dirty #121 Tainted: G O
[ 216.713831] -------------------------------------------------------
[ 216.713832] bash/342 is trying to acquire lock:
[ 216.713833] (s_active#162){++++.+}, at: [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
[ 216.713840]
but task is already holding lock:
[ 216.713842] (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
[ 216.713846]
which lock already depends on the new lock.

[ 216.713848]
the existing dependency chain (in reverse order) is:
[ 216.713849]
-> #1 (&bdev->bd_mutex){+.+.+.}:
[ 216.713852] [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
[ 216.713856] [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
[ 216.713858] [<ffffffff81528fc6>] mutex_lock_nested+0x5e/0x35f
[ 216.713860] [<ffffffff81184148>] revalidate_disk+0x4b/0x7c
[ 216.713863] [<ffffffffa02224d0>] disksize_store+0x1b1/0x1f4 [zram]
[ 216.713866] [<ffffffff813f8994>] dev_attr_store+0x19/0x23
[ 216.713870] [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
[ 216.713872] [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
[ 216.713874] [<ffffffff811502c2>] __vfs_write+0x26/0xbe
[ 216.713877] [<ffffffff811508b2>] vfs_write+0xc0/0x145
[ 216.713879] [<ffffffff81150fd0>] SyS_write+0x51/0x8f
[ 216.713881] [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
[ 216.713884]
-> #0 (s_active#162){++++.+}:
[ 216.713886] [<ffffffff8107b69e>] check_prevs_add+0x19e/0x747
[ 216.713889] [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
[ 216.713891] [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
[ 216.713892] [<ffffffff811adac4>] __kernfs_remove+0x1b6/0x2cd
[ 216.713895] [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
[ 216.713897] [<ffffffff811b0872>] remove_files+0x42/0x67
[ 216.713899] [<ffffffff811b0b39>] sysfs_remove_group+0x69/0x88
[ 216.713901] [<ffffffffa02226a0>] zram_remove+0x66/0xf0 [zram]
[ 216.713904] [<ffffffffa02227bf>] zram_remove_store+0x95/0xba [zram]
[ 216.713906] [<ffffffff813fe053>] class_attr_store+0x1c/0x26
[ 216.713909] [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
[ 216.713911] [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
[ 216.713913] [<ffffffff811502c2>] __vfs_write+0x26/0xbe
[ 216.713915] [<ffffffff811508b2>] vfs_write+0xc0/0x145
[ 216.713917] [<ffffffff81150fd0>] SyS_write+0x51/0x8f
[ 216.713918] [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
[ 216.713920]
other info that might help us debug this:

[ 216.713922] Possible unsafe locking scenario:

[ 216.713923] CPU0 CPU1
[ 216.713924] ---- ----
[ 216.713925] lock(&bdev->bd_mutex);
[ 216.713927] lock(s_active#162);
[ 216.713929] lock(&bdev->bd_mutex);
[ 216.713930] lock(s_active#162);
[ 216.713932]
*** DEADLOCK ***

[ 216.713934] 5 locks held by bash/342:
[ 216.713935] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff811508a1>] vfs_write+0xaf/0x145
[ 216.713938] #1: (&of->mutex){+.+.+.}, at: [<ffffffff811af1d3>] kernfs_fop_write+0x9c/0x14c
[ 216.713942] #2: (s_active#163){.+.+.+}, at: [<ffffffff811af1dc>] kernfs_fop_write+0xa5/0x14c
[ 216.713946] #3: (zram_index_mutex){+.+.+.}, at: [<ffffffffa022276f>] zram_remove_store+0x45/0xba [zram]
[ 216.713950] #4: (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
[ 216.713954]
stack backtrace:
[ 216.713957] CPU: 1 PID: 342 Comm: bash Tainted: G O 4.1.0-rc1-next-20150430-dbg-00010-ga86accf-dirty #121
[ 216.713958] Hardware name: SAMSUNG ELECTRONICS CO.,LTD Samsung DeskTop System/Samsung DeskTop System, BIOS 05CC 04/09/2010
[ 216.713960] ffffffff82400210 ffff8800ba367a28 ffffffff815265b1 ffffffff810785f2
[ 216.713962] ffffffff8242f970 ffff8800ba367a78 ffffffff8107aac7 ffffffff817bd85e
[ 216.713965] ffff8800bdeca1a0 ffff8800bdeca9c0 ffff8800bdeca998 ffff8800bdeca9c0
[ 216.713967] Call Trace:
[ 216.713971] [<ffffffff815265b1>] dump_stack+0x4c/0x6e
[ 216.713973] [<ffffffff810785f2>] ? up+0x39/0x3e
[ 216.713975] [<ffffffff8107aac7>] print_circular_bug+0x2b1/0x2c2
[ 216.713976] [<ffffffff8107b69e>] check_prevs_add+0x19e/0x747
[ 216.713979] [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
[ 216.713981] [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
[ 216.713983] [<ffffffff811ae88d>] ? kernfs_remove_by_name_ns+0x70/0x8c
[ 216.713985] [<ffffffff811adac4>] __kernfs_remove+0x1b6/0x2cd
[ 216.713987] [<ffffffff811ae88d>] ? kernfs_remove_by_name_ns+0x70/0x8c
[ 216.713989] [<ffffffff811adca8>] ? kernfs_find_ns+0xcd/0x10e
[ 216.713990] [<ffffffff81529294>] ? mutex_lock_nested+0x32c/0x35f
[ 216.713992] [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
[ 216.713994] [<ffffffff811b0872>] remove_files+0x42/0x67
[ 216.713996] [<ffffffff811b0b39>] sysfs_remove_group+0x69/0x88
[ 216.713999] [<ffffffffa02226a0>] zram_remove+0x66/0xf0 [zram]
[ 216.714001] [<ffffffffa02227bf>] zram_remove_store+0x95/0xba [zram]
[ 216.714003] [<ffffffff813fe053>] class_attr_store+0x1c/0x26
[ 216.714005] [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
[ 216.714007] [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
[ 216.714009] [<ffffffff811502c2>] __vfs_write+0x26/0xbe
[ 216.714011] [<ffffffff8116b29b>] ? __close_fd+0x25/0xdd
[ 216.714013] [<ffffffff81079a27>] ? __lock_is_held+0x3c/0x57
[ 216.714015] [<ffffffff811508b2>] vfs_write+0xc0/0x145
[ 216.714017] [<ffffffff81150fd0>] SyS_write+0x51/0x8f
[ 216.714019] [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
[ 216.714063] zram: Removed device: zram0


-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/