workqueue deadlock

From: Bjorn Helgaas
Date: Wed Dec 06 2006 - 19:26:39 EST


I'm seeing a workqueue-related deadlock. This is on an ia64
box running SLES10, but it looks like the same problem should
be possible in current upstream on any architecture.

Here are the two tasks involved:

events/4:
schedule
__down
__lock_cpu_hotplug
lock_cpu_hotplug
flush_workqueue
kblockd_flush
blk_sync_queue
cfq_shutdown_timer_wq
cfq_exit_queue
elevator_exit
blk_cleanup_queue
scsi_free_queue
scsi_device_dev_release_usercontext
run_workqueue

loadkeys:
schedule
flush_cpu_workqueue
flush_workqueue
flush_scheduled_work
release_dev
tty_release

loadkeys is holding the cpu_hotplug lock (acquired in flush_workqueue())
and waiting in flush_cpu_workqueue() until the cpu_workqueue drains.

But events/4 is responsible for draining it, and it is blocked waiting
to acquire the cpu_hotplug lock.

In current upstream, the cpu_hotplug lock has been replaced with
workqueue_mutex, but it looks to me like the same deadlock is still
possible.

Is there some rule that workqueue functions shouldn't try to
flush a workqueue? Or should flush_workqueue() be smarter by
releasing the workqueue_mutex once in a while?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/