Re: false positive lockdep splat with loop device

From: Ilya Dryomov
Date: Tue Oct 10 2017 - 05:16:15 EST

Next message: Mark Brown: "Applied "spi: a3700: Change SPI mode before asserting chip-select" to the spi tree"
Previous message: Arnd Bergmann: "[PATCH] clk: samsung: exynos5433: mark PM functions as __maybe_unused"
Next in thread: Ilya Dryomov: "Re: false positive lockdep splat with loop device"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Oct 5, 2017 at 6:33 PM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> Does the patch below fix the warning for you?
>
> --
> From 28aae7104425433d39e6142adcd5b88dc5b0ad5f Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@xxxxxx>
> Date: Thu, 5 Oct 2017 18:31:02 +0200
> Subject: block: use DECLARE_COMPLETION_ONSTACK in submit_bio_wait
>
> This way we get our lockdep annotations right in case multiple layers
> in the stack use submit_bio_wait.
>
> It also happens to simplify the code by getting rid of the submit_bio_ret
> structure.
>
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> ---
> block/bio.c | 19 +++++--------------
> 1 file changed, 5 insertions(+), 14 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 8338304ea256..4e18e959fc0a 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -917,17 +917,9 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> }
> EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
>
> -struct submit_bio_ret {
> - struct completion event;
> - int error;
> -};
> -
> static void submit_bio_wait_endio(struct bio *bio)
> {
> - struct submit_bio_ret *ret = bio->bi_private;
> -
> - ret->error = blk_status_to_errno(bio->bi_status);
> - complete(&ret->event);
> + complete(bio->bi_private);
> }
>
> /**
> @@ -943,16 +935,15 @@ static void submit_bio_wait_endio(struct bio *bio)
> */
> int submit_bio_wait(struct bio *bio)
> {
> - struct submit_bio_ret ret;
> + DECLARE_COMPLETION_ONSTACK(done);
>
> - init_completion(&ret.event);
> - bio->bi_private = &ret;
> + bio->bi_private = &done;
> bio->bi_end_io = submit_bio_wait_endio;
> bio->bi_opf |= REQ_SYNC;
> submit_bio(bio);
> - wait_for_completion_io(&ret.event);
> + wait_for_completion_io(&done);
>
> - return ret.error;
> + return blk_status_to_errno(bio->bi_status);
> }
> EXPORT_SYMBOL(submit_bio_wait);

No, it doesn't -- the splat is a little bit more complicated, but
fundamentally the same thing.

Thanks,

Ilya
run fstests shared/298 at 2017-10-09 16:54:45

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc4-ceph-g0721a67fa48f #1 Tainted: G W
------------------------------------------------------
loop0/23166 is trying to acquire lock:
(&ei->i_data_sem){++++}, at: [<ffffffffa6340226>] ext4_punch_hole+0x3b6/0x570

but now in release context of a crosslock acquired at the following:
((complete)&ret.event){+.+.}, at: [<ffffffffa645c7ab>] submit_bio_wait+0x8b/0xb0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 ((complete)&ret.event){+.+.}:
__lock_acquire+0x111f/0x1150
lock_acquire+0xed/0x1e0
wait_for_completion_io+0x53/0x1b0
submit_bio_wait+0x8b/0xb0
blkdev_issue_zeroout+0xcf/0x1b0
ext4_issue_zeroout+0x44/0x60
ext4_ext_zeroout+0x2f/0x40
ext4_ext_convert_to_initialized+0x6ee/0x7a0
ext4_ext_map_blocks+0x151d/0x1620
ext4_map_blocks+0x14f/0x5b0
ext4_writepages+0xa4a/0x11e0
do_writepages+0x1c/0x80
__filemap_fdatawrite_range+0xaa/0xe0
file_write_and_wait_range+0x39/0xa0
ext4_sync_file+0xae/0x580
vfs_fsync_range+0x4b/0xb0
do_fsync+0x3d/0x70
SyS_fsync+0x10/0x20
entry_SYSCALL_64_fastpath+0x23/0xc2

-> #0 (&ei->i_data_sem){++++}:
down_write+0x40/0x70
ext4_punch_hole+0x3b6/0x570
ext4_fallocate+0x19d/0xb20
loop_queue_work+0xd7/0x9b0
kthread_worker_fn+0xbf/0x200

other info that might help us debug this:

Possible unsafe locking scenario by crosslock:

CPU0 CPU1
---- ----
lock(&ei->i_data_sem);
lock((complete)&ret.event);
lock(&ei->i_data_sem);
unlock((complete)&ret.event);

*** DEADLOCK ***

1 lock held by loop0/23166:
#0: (&x->wait#17){..-.}, at: [<ffffffffa60d678d>] complete+0x1d/0x60

stack backtrace:
CPU: 0 PID: 23166 Comm: loop0 Tainted: G W 4.14.0-rc4-ceph-g0721a67fa48f #1
Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.0c 02/25/2010
Call Trace:
dump_stack+0x85/0xc7
print_circular_bug+0x1eb/0x2e0
? print_bfs_bug+0x40/0x40
check_prev_add+0x396/0x7e0
? jbd2_journal_stop+0xd0/0x4d0
lock_commit_crosslock+0x3db/0x5e0
? lock_commit_crosslock+0x3db/0x5e0
complete+0x29/0x60
submit_bio_wait_endio+0x25/0x30
bio_endio+0x9f/0x1e0
blk_update_request+0xc4/0x3c0
blk_mq_end_request+0x1e/0x70
lo_complete_rq+0x30/0x80
__blk_mq_complete_request+0xa3/0x160
blk_mq_complete_request+0x16/0x20
loop_queue_work+0x5f/0x9b0
? find_held_lock+0x39/0xb0
? kthread_worker_fn+0x9c/0x200
? trace_hardirqs_on_caller+0x11f/0x190
kthread_worker_fn+0xbf/0x200
loop_kthread_worker_fn+0x1e/0x20
kthread+0x152/0x190
? loop_get_status64+0x90/0x90
? kthread_stop+0x2a0/0x2a0
ret_from_fork+0x2a/0x40
run fstests shared/298 at 2017-10-09 20:38:50

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc4-ceph-gc9ed9bcb0121 #1 Not tainted
------------------------------------------------------
loop0/15423 is trying to acquire lock:
(jbd2_handle){++++}, at: [<ffffffffbd37b694>] start_this_handle+0x104/0x450

but now in release context of a crosslock acquired at the following:
((complete)&done#3){+.+.}, at: [<ffffffffbd45c79b>] submit_bio_wait+0x8b/0xc0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 ((complete)&done#3){+.+.}:
__lock_acquire+0x111f/0x1150
lock_acquire+0xed/0x1e0
wait_for_completion_io+0x53/0x1b0
submit_bio_wait+0x8b/0xc0
blkdev_issue_zeroout+0xcf/0x1b0
ext4_init_inode_table+0x153/0x339
ext4_lazyinit_thread+0x2bc/0x3d0
kthread+0x152/0x190
ret_from_fork+0x2a/0x40

-> #1 (&meta_group_info[i]->alloc_sem){++++}:
__lock_acquire+0x111f/0x1150
lock_acquire+0xed/0x1e0
down_read+0x43/0x70
__ext4_new_inode+0xc4a/0x1470
ext4_create+0x103/0x190
lookup_open+0x78a/0x880
path_openat+0x3c0/0xad0
do_filp_open+0x8a/0xf0
do_sys_open+0x11b/0x1f0
SyS_open+0x1e/0x20
entry_SYSCALL_64_fastpath+0x23/0xc2

-> #0 (jbd2_handle){++++}:
start_this_handle+0x167/0x450
jbd2__journal_start+0xdb/0x2b0
__ext4_journal_start_sb+0x89/0x1e0
ext4_punch_hole+0x229/0x570
ext4_fallocate+0x19d/0xb20

other info that might help us debug this:

Chain exists of:
jbd2_handle --> &meta_group_info[i]->alloc_sem --> (complete)&done#3

Possible unsafe locking scenario by crosslock:

CPU0 CPU1
---- ----
lock(&meta_group_info[i]->alloc_sem);
lock((complete)&done#3);
lock(jbd2_handle);
unlock((complete)&done#3);

*** DEADLOCK ***

1 lock held by loop0/15423:
#0: (&x->wait#17){..-.}, at: [<ffffffffbd0d678d>] complete+0x1d/0x60

stack backtrace:
CPU: 1 PID: 15423 Comm: loop0 Not tainted 4.14.0-rc4-ceph-gc9ed9bcb0121 #1
Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.0c 02/25/2010
Call Trace:
dump_stack+0x85/0xc7
print_circular_bug+0x1eb/0x2e0
? print_bfs_bug+0x40/0x40
check_prev_add+0x396/0x7e0
? __lock_acquire+0x705/0x1150
lock_commit_crosslock+0x3db/0x5e0
? lock_commit_crosslock+0x3db/0x5e0
complete+0x29/0x60
submit_bio_wait_endio+0x12/0x20
bio_endio+0x9f/0x1e0
blk_update_request+0xc4/0x3c0
blk_mq_end_request+0x1e/0x70
lo_complete_rq+0x30/0x80
__blk_mq_complete_request+0xa3/0x160
blk_mq_complete_request+0x16/0x20
loop_queue_work+0x5f/0x9b0
? find_held_lock+0x39/0xb0
? kthread_worker_fn+0x9c/0x200
? trace_hardirqs_on_caller+0x11f/0x190
kthread_worker_fn+0xbf/0x200
loop_kthread_worker_fn+0x1e/0x20
kthread+0x152/0x190
? loop_get_status64+0x90/0x90
? kthread_stop+0x2a0/0x2a0
ret_from_fork+0x2a/0x40

Next message: Mark Brown: "Applied "spi: a3700: Change SPI mode before asserting chip-select" to the spi tree"
Previous message: Arnd Bergmann: "[PATCH] clk: samsung: exynos5433: mark PM functions as __maybe_unused"
Next in thread: Ilya Dryomov: "Re: false positive lockdep splat with loop device"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]