Re: FW: avoiding run_workqueue() recursion

From: Oleg Nesterov
Date: Wed Jul 15 2009 - 14:43:19 EST


Hi Anirban,

On 07/14, Anirban Sinha wrote:
>
> >I had a question about one of your previous commits:
> >
> >: commit 2355b70fd59cb5be7de2052a9edeee7afb7ff099
> >: Author: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
> >: Date: Thu Apr 2 16:58:24 2009 -0700
> >:
> >: workqueue: avoid recursion in run_workqueue()
> >
> >http://git.kernel.org/linus/2355b70fd59cb5be7de2052a9edeee7afb7ff099
> >
> >
> >I saw a few discussions on the mailing list around this. I also did see
> >your "I still don't know why I merged ..." comment on this. I have the
> >following observations. I am new in the kernel hacking world, so please
> >bear with me.
> >
> >(a) I do agree that flushing the work queues from within
> run_workqueue()
> >is buggy in itself.
> >
> >(b) I do also agree that recursive call to run_workqueue() is bad due
> to
> >the reasons cited in the commit log (even though I had a good laugh
> when
> >I saw the "morton gets to eat his hat" stuff :)).
> >
> >(c) I am a little puzzled by the change the patch made. If we let the
> >call sleep on completion when keventd is itself running the
> >flush_workqueue(), are we not introducing a deadlock? If the thread
> that
> >is itself is responsible for walking the workqueue and dispatching the
> >work functions goes to sleep, who will wake it up?

Yes, this will deadlock. Note the WARN_ON().

> >In my honest opinion, I think we should simply return when (cwq->thread
> >== current) is true. I think in that condition, it should be just a
> >nop.

If we just return silently, we do not flush but hide the problem ?
And in this can lead to other problems which are very hard to
trigger/debug.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/