Re: Traceback with CONFIG_REGMAP_KUNIT=y+CONFIG_DEBUG_ATOMIC_SLEEP=y

From: Guenter Roeck
Date: Thu Jul 20 2023 - 12:42:13 EST


On 7/20/23 09:25, Guenter Roeck wrote:
On 7/20/23 08:07, Mark Brown wrote:
On Thu, Jul 20, 2023 at 08:03:13AM -0700, Guenter Roeck wrote:
On 7/20/23 07:31, Mark Brown wrote:

They're both independently fine, but I wouldn't expect anything that's
running in atomic context to be actually using dynamic allocations.

Which one do you prefer ? As I mentioned in my second patch, there are
two drivers which use fast_io together with REGCACHE_RBTREE and thus
are likely affected by this problem. Dan's solution would cover that,
while my current RFC patch would likely cause those drivers to fail.
Plus, of course, they could get stuck if they actually end up trying to
sleep while allocating memory.

Like I say I don't think it's an either/or - we can do both
independently, they both make sense standalone and don't conflict with
each other.

I guess I am missing something. I have not tried it, but wouldn't my patches
be unnecessary if Dan's patch is applied ?


Actually, Dan's patch isn't complete. With it applied:

[ 4.816104] # Subtest: regmap
[ 4.816175] 1..22
[ 4.816266] KTAP version 1
[ 4.816343] # Subtest: basic_read_write
[ 4.821773] ok 1 none
[ 4.827032] ok 2 flat
[ 4.832404] ok 3 rbtree
[ 4.834664] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
[ 4.834935] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 167, name: kunit_try_catch
[ 4.835059] preempt_count: 1, expected: 0
[ 4.835198] 1 lock held by kunit_try_catch/167:
[ 4.835297] #0: 838e9c10 (regmap_kunit:86:(config)->lock){....}-{2:2}, at: regmap_lock_spinlock+0x14/0x1c
[ 4.835980] irq event stamp: 146
[ 4.836057] hardirqs last enabled at (145): [<8078bfa8>] crng_make_state+0x1a0/0x294
[ 4.836176] hardirqs last disabled at (146): [<80c5f62c>] _raw_spin_lock_irqsave+0x7c/0x80
[ 4.836297] softirqs last enabled at (0): [<80110cc4>] copy_process+0x810/0x216c
[ 4.836413] softirqs last disabled at (0): [<00000000>] 0x0
[ 4.836628] CPU: 0 PID: 167 Comm: kunit_try_catch Tainted: G N 6.5.0-rc1-00028-gc4be22597a36-dirty #6
[ 4.836809] Hardware name: Generic DT based system
[ 4.837002] unwind_backtrace from show_stack+0x18/0x1c
[ 4.837134] show_stack from dump_stack_lvl+0x38/0x5c
[ 4.837229] dump_stack_lvl from __might_resched+0x188/0x2d0
[ 4.837325] __might_resched from __kmem_cache_alloc_node+0x1f4/0x258
[ 4.837426] __kmem_cache_alloc_node from __kmalloc+0x48/0x170
[ 4.837521] __kmalloc from regcache_maple_write+0x194/0x248
[ 4.837617] regcache_maple_write from _regmap_write+0x88/0x140
[ 4.837711] _regmap_write from regmap_write+0x44/0x68
[ 4.837797] regmap_write from basic_read_write+0x8c/0x27c
[ 4.837889] basic_read_write from kunit_generic_run_threadfn_adapter+0x1c/0x28
[ 4.837996] kunit_generic_run_threadfn_adapter from kthread+0xf8/0x120
[ 4.838099] kthread from ret_from_fork+0x14/0x3c
[ 4.838214] Exception stack(0x881a5fb0 to 0x881a5ff8)
[ 4.838346] 5fa0: 00000000 00000000 00000000 00000000
[ 4.838465] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 4.838576] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 4.841868] ok 4 maple
[ 4.841923] # basic_read_write: pass:4 fail:0 skip:0 total:4
[ 4.842022] ok 1 basic_read_write

It would have to be extended to also address the same problem in the maple tree
code. Also, the change would probably not be needed in regcache_rbtree_init().

After adding the GFP_KERNEL -> map->alloc_flags changes to the maple tree
code while skipping the init functions, I no longer see the traceback.
This is without my patches.

Thanks,
Guenter