Re: [RFC] nvmet: Always remove processed AER elements from list

From: Johannes Thumshirn
Date: Thu Oct 31 2019 - 03:06:13 EST


On 2019-10-30 20:58, Chaitanya Kulkarni wrote:
On 10/30/2019 08:24 AM, Daniel Wagner wrote:
Hi,

I've got following oops:

PID: 79413 TASK: ffff92f03a814ec0 CPU: 19 COMMAND: "kworker/19:2"
#0 [ffffa5308b8c3c58] machine_kexec at ffffffff8e05dd02
#1 [ffffa5308b8c3ca8] __crash_kexec at ffffffff8e12102a
#2 [ffffa5308b8c3d68] crash_kexec at ffffffff8e122019
#3 [ffffa5308b8c3d80] oops_end at ffffffff8e02e091
#4 [ffffa5308b8c3da0] general_protection at ffffffff8e8015c5
[exception RIP: nvmet_async_event_work+94]
RIP: ffffffffc0d9a80e RSP: ffffa5308b8c3e58 RFLAGS: 00010202
RAX: dead000000000100 RBX: ffff92dcbc7464b0 RCX: 0000000000000002
RDX: 0000000000040002 RSI: 38ffff92dc9814cf RDI: ffff92f217722f20
RBP: ffff92dcbc746418 R8: 0000000000000000 R9: 0000000000000000
R10: 000000000000035b R11: ffff92efb8dd2091 R12: ffff92dcbc7464a0
R13: ffff92dbe03a5f29 R14: 0000000000000000 R15: 0ffff92f92f26864
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#5 [ffffa5308b8c3e78] process_one_work at ffffffff8e0a3b0c
#6 [ffffa5308b8c3eb8] worker_thread at ffffffff8e0a41e7
#7 [ffffa5308b8c3f10] kthread at ffffffff8e0a93af
#8 [ffffa5308b8c3f50] ret_from_fork at ffffffff8e800235

this maps to nvmet_async_event_results. So it looks like this function
access a stale aen pointer:

static u32 nvmet_async_event_result(struct nvmet_async_event *aen)
{
return aen->event_type | (aen->event_info << 8) | (aen->log_page << 16);
}
Can you please explain the test setup ? Is that coming from the tests
present in the blktests ? if so you can please provide test number ?

No unfortunately this is coming from a customer bug report. We _think_ we're having a race between AEN processing and nvmet_sq_destroy(), but we're not 100% sure. Hence this RFC.