Re: INFO: possible circular locking dependency atcleanup_workqueue_thread

From: Oleg Nesterov
Date: Tue May 19 2009 - 08:06:47 EST


On 05/19, Johannes Berg wrote:
>
> On Mon, 2009-05-18 at 21:47 +0200, Oleg Nesterov wrote:
>
> > > Maybe it shouldn't do that from the CPU_POST_DEAD
> > > notifier?
> >
> > Well, in any case we should understand why we have the problem, before
> > changing the code. And CPU_POST_DEAD is not special, why should we treat
> > it specially and skip lock_map_acquire(wq->lockdep_map) ?
>
> I'm not familiar enough with the code -- but what are we really trying
> to do in CPU_POST_DEAD? It seems to me that at that time things must
> already be off the CPU, so ...?

Yes, this cpu is dead, we should do cleanup_workqueue_thread() to kill
cwq->thread.

> On the other hand that calls
> flush_cpu_workqueue() so it seems it would actually wait for the work to
> be executed on some other CPU, within the CPU_POST_DEAD notification?

Yes. Because we can't just kill cwq->thread, we can have the pending
work_structs so we have to flush.

Why can't we move these works to another CPU? We can, but this doesn't
really help. Because in any case we should at least wait for
cwq->current_work to complete.

Why do we use CPU_POST_DEAD, and not (say) CPU_DEAD to flush/kill ?
Because work->func() can sleep in get_online_cpus(), we can't flush
until we drop cpu_hotplug.lock.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/