Re: [PATCH] libnvdimm: rework region badblocks clearing

From: Kani, Toshimitsu
Date: Mon May 01 2017 - 11:43:57 EST


On Sun, 2017-04-30 at 05:39 -0700, Dan Williams wrote:
> Toshi noticed that the new support for a region-level badblocks
> missed the case where errors are cleared due to BTT I/O.
>
> An initial attempt to fix this ran into a "sleeping while atomic"
> warning due to taking the nvdimm_bus_lock() in the BTT I/O path to
> satisfy the locking requirements of __nvdimm_bus_badblocks_clear().
> However, that lock is not needed since we are not acting any data
> that is subject to change due to a change of state of the bus /
> region. The badblocks instance has its own internal lock to handle
> mutations of the error list.
>
> So, to make it clear that we are just acting on region devices and
> don't need the lock rename __nvdimm_bus_badblocks_clear() to
> nvdimm_clear_badblocks_regions(). Eliminate the lock and consolidate
> all routines in drivers/nvdimm/bus.c. Also, make some cleanups to
> remove unnecessary casts, make the calling convention of
> nvdimm_clear_badblocks_regions() clearer by replacing struct resource
> with the minimal struct clear_badblocks_context, and use the
> DEVICE_ATTR macro.

Hi Dan,

I was testing the change with CONFIG_DEBUG_ATOMIC_SLEEP set this time,
and hit the following BUG with BTT. This isÂa separate issue (not
introduced by this patch), but it shows that we have an issue with the
DSM call path as well.

[ 1279.712933] nfit ACPI0012:00: acpi_nfit_ctl:bus cmd: 1: func: 1
input length: 16
[ 1279.721111] nvdimm inÂÂ00000000: 60000000 00000002 00001000
00000000ÂÂ...`............
[ 1279.729799] BUG: sleeping function called from invalid context at
mm/slab.h:432
[ 1279.738005] in_atomic(): 1, irqs_disabled(): 0, pid: 13353, name: dd
[ 1279.745187] INFO: lockdep is turned off.
:
[ 1279.767908] Call Trace:
[ 1279.771116]ÂÂdump_stack+0x86/0xc3
[ 1279.775201]ÂÂ___might_sleep+0x17d/0x250
[ 1279.779808]ÂÂ__might_sleep+0x4a/0x80
[ 1279.784214]ÂÂ__kmalloc+0x1c0/0x2e0
[ 1279.788388]ÂÂacpi_os_allocate_zeroed+0x2d/0x2f
[ 1279.793604]ÂÂacpi_evaluate_object+0x59/0x3b1
[ 1279.798640]ÂÂacpi_evaluate_dsm+0xbd/0x10c
[ 1279.803458]ÂÂacpi_nfit_ctl+0x1ef/0x7c0 [nfit]
[ 1279.808584]ÂÂ? nsio_rw_bytes+0x152/0x280
[ 1279.813258]ÂÂnvdimm_clear_poison+0x77/0x140
[ 1279.818193]ÂÂnsio_rw_bytes+0x18f/0x280
[ 1279.822684]ÂÂbtt_write_pg+0x1d4/0x3d0 [nd_btt]
[ 1279.827869]ÂÂbtt_make_request+0x119/0x2d0 [nd_btt]
[ 1279.833398]ÂÂ? generic_make_request+0xef/0x3b0
[ 1279.838575]ÂÂgeneric_make_request+0x122/0x3b0
[ 1279.843661]ÂÂ? iov_iter_get_pages+0xbd/0x380
[ 1279.848666]ÂÂsubmit_bio+0x73/0x150
[ 1279.852801]ÂÂ? bio_iov_iter_get_pages+0xd7/0x120
[ 1279.858166]ÂÂ? __blkdev_direct_IO_simple+0x17b/0x340
[ 1279.863877]ÂÂ__blkdev_direct_IO_simple+0x177/0x340
[ 1279.869453]ÂÂ? bdput+0x20/0x20
[ 1279.873231]ÂÂblkdev_direct_IO+0x3b1/0x3c0
[ 1279.877963]ÂÂ? current_time+0x18/0x70
[ 1279.882344]ÂÂgeneric_file_direct_write+0xba/0x180
[ 1279.887765]ÂÂ__generic_file_write_iter+0xc0/0x1c0
[ 1279.893185]ÂÂ? __clear_user+0x23/0x70
[ 1279.897550]ÂÂblkdev_write_iter+0x8b/0x100
[ 1279.902258]ÂÂ? __might_sleep+0x4a/0x80
[ 1279.906699]ÂÂ__vfs_write+0xe8/0x160
[ 1279.910876]ÂÂvfs_write+0xcb/0x1f0
[ 1279.914867]ÂÂSyS_write+0x58/0xc0
[ 1279.918773]ÂÂdo_syscall_64+0x6c/0x1f0
[ 1279.923120]ÂÂentry_SYSCALL64_slow_path+0x25/0x25

Thanks,
-Toshi