Re: xenbus hang after userspace ctrl-c of xenstore-rm

From: JÃrgen GroÃ
Date: Tue Oct 01 2019 - 07:33:40 EST


On 01.10.19 11:57, James Dingwall wrote:
Hi,

I have been investigating a problem where xenstore becomes unresponsive
during domain shutdowns. My test script seems to trigger the problem
but without definitively being the same. It is possible to replicate
the issue in dom0 or a domU. If the test script is run in dom0 it seems
that it is possible to affect xenstore access in domUs but I have not
observed any negative impact in dom0 or other guests when running in a
domU.

The environment is a default Ubuntu 5.0.0-29-generic kernel, xen
4.11.3-pre (built from current head of staging-4.11), xenstore is
running in a stubdom. I did try a kernel with
d10e0cc113c9e1b64b5c6e3db37b5c839794f3df "xenbus: Avoid deadlock during
suspend due to open transactions" but that didn't help, this stack trace
is with that patch applied.

[ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
[ 2551.492215] Tainted: P OE 5.0.0-29-generic #5
[ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2551.528585] xenbus D 0 37 2 0x80000080
[ 2551.528590] Call Trace:
[ 2551.528603] __schedule+0x2c0/0x870
[ 2551.528606] ? _cond_resched+0x19/0x40
[ 2551.528632] schedule+0x2c/0x70
[ 2551.528637] xs_talkv+0x1ec/0x2b0
[ 2551.528642] ? wait_woken+0x80/0x80
[ 2551.528645] xs_single+0x53/0x80
[ 2551.528648] xenbus_transaction_end+0x3b/0x70
[ 2551.528651] xenbus_file_free+0x5a/0x160
[ 2551.528654] xenbus_dev_queue_reply+0xc4/0x220
[ 2551.528657] xenbus_thread+0x7de/0x880
[ 2551.528660] ? wait_woken+0x80/0x80
[ 2551.528665] kthread+0x121/0x140
[ 2551.528667] ? xb_read+0x1d0/0x1d0
[ 2551.528670] ? kthread_park+0x90/0x90
[ 2551.528673] ret_from_fork+0x35/0x40

Yes, this is a self-deadlock when cleaning up a user's file context.
Thanks for the nice debug data. :-)

I need to do the cleanup via a workqueue instead of calling it directly.

Cooking up a patch now...


Juergen