Re: [PATCH] workqueue: not allow recursion run_workqueue

From: Lai Jiangshan
Date: Thu Jan 22 2009 - 01:04:20 EST

Next message: Jeff Mahoney: "[BUG] Page table corruption from commit 9542ada803198e6eba29d3289abb39ea82047b92."
Previous message: Felix Blyakher: "Re: [PATCH] Re: Corrupted XFS log replay oops."
In reply to: Oleg Nesterov: "Re: [PATCH] workqueue: not allow recursion run_workqueue"
Next in thread: Peter Zijlstra: "Re: [PATCH] workqueue: not allow recursion run_workqueue"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Peter Zijlstra wrote:
> On Wed, 2009-01-21 at 17:42 +0800, Lai Jiangshan wrote:
>> 1) lockdep will complain when recursion run_workqueue
>> 2) works is not run orderly when recursion run_workqueue
>>
>> 3) BUG!
>> We use recursion run_workqueue to hidden deadlock when
>> keventd trying to flush its own queue.
>>
>> It's bug. When flush_workqueue()(nested in a work callback)returns,
>> the workqueue is not really flushed, the sequence statement of
>> this work callback will do some thing bad.
>>
>> So we should not allow workqueue trying to flush its own queue.
>
> The patch looks good, but I'm utterly failing to comprehend this
> changelog. What exactly can go wrong (other than the obvious too deep
> nest and the fact that lockdep will complain)?

void do_some_cleanup(void)
{
find_all_queued_work_struct_and_mark_it_old();
flush_workqueue(workqueue);
/* we can destroy old work_struct for we have flushed them */
destroy_old_work_structs();
}

if work->func() called do_some_cleanup(), it's very probably a bug.

>
>> Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
>> ---
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 2f44583..1129cde 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -48,8 +48,6 @@ struct cpu_workqueue_struct {
>>
>> struct workqueue_struct *wq;
>> struct task_struct *thread;
>> -
>> - int run_depth; /* Detect run_workqueue() recursion depth */
>> } ____cacheline_aligned;
>>
>> /*
>> @@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on);
>> static void run_workqueue(struct cpu_workqueue_struct *cwq)
>> {
>> spin_lock_irq(&cwq->lock);
>> - cwq->run_depth++;
>> - if (cwq->run_depth > 3) {
>> - /* morton gets to eat his hat */
>> - printk("%s: recursion depth exceeded: %d\n",
>> - __func__, cwq->run_depth);
>> - dump_stack();
>> - }
>> while (!list_empty(&cwq->worklist)) {
>> struct work_struct *work = list_entry(cwq->worklist.next,
>> struct work_struct, entry);
>> @@ -311,7 +302,6 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq)
>> spin_lock_irq(&cwq->lock);
>> cwq->current_work = NULL;
>> }
>> - cwq->run_depth--;
>> spin_unlock_irq(&cwq->lock);
>> }
>>
>> @@ -368,29 +358,20 @@ static void insert_wq_barrier(struct cpu_workqueue_struct *cwq,
>>
>> static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
>> {
>> - int active;
>> + int active = 0;
>> + struct wq_barrier barr;
>>
>> - if (cwq->thread == current) {
>> - /*
>> - * Probably keventd trying to flush its own queue. So simply run
>> - * it by hand rather than deadlocking.
>> - */
>> - run_workqueue(cwq);
>> - active = 1;
>> - } else {
>> - struct wq_barrier barr;
>> + WARN_ON(cwq->thread == current);
>>
>> - active = 0;
>> - spin_lock_irq(&cwq->lock);
>> - if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
>> - insert_wq_barrier(cwq, &barr, &cwq->worklist);
>> - active = 1;
>> - }
>> - spin_unlock_irq(&cwq->lock);
>> -
>> - if (active)
>> - wait_for_completion(&barr.done);
>> + spin_lock_irq(&cwq->lock);
>> + if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
>> + insert_wq_barrier(cwq, &barr, &cwq->worklist);
>> + active = 1;
>> }
>> + spin_unlock_irq(&cwq->lock);
>> +
>> + if (active)
>> + wait_for_completion(&barr.done);
>>
>> return active;
>> }
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jeff Mahoney: "[BUG] Page table corruption from commit 9542ada803198e6eba29d3289abb39ea82047b92."
Previous message: Felix Blyakher: "Re: [PATCH] Re: Corrupted XFS log replay oops."
In reply to: Oleg Nesterov: "Re: [PATCH] workqueue: not allow recursion run_workqueue"
Next in thread: Peter Zijlstra: "Re: [PATCH] workqueue: not allow recursion run_workqueue"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]