Re: Is adding requeue_delayed_work() a good idea

From: Roland Dreier
Date: Fri Aug 21 2009 - 17:53:15 EST



> We need some simple changes in timer.c. __mod_timer() already has
> pending_only, but requeue_delayed_work() needs another flag to prevent
> migrating to another CPU. Again, this is simple, let's suppose we have
> requeue_timer(timer) which works like mod_timer(pending_only => true)
> but never changes timer->base.

Yes... in my case I don't particularly care about which CPU the timer or
work runs on, so I ignored that.

> The main question is: what should requeue_delayed_work(dwork) do when
> dwork->timer is not pending but dwork->work is queued or running?
> Should it cancel dwork->work is this case?

In my particular case it doesn't really matter. In the queued case it
could leave it to run whenever it gets to the head of the workqueue. In
the already running case then I think the timer should be reset. The
main point is that if I do requeue_delayed_work() I want to make sure
the work runs all the way through from the beginning at some point in
the future. The pattern I have in mind is something like:

spin_lock_irqsave(&mydata_lock);
new_timeout = add_item_to_timeout_list();
requeue_delayed_work(wq, &process_timeout_list_work, new_timeout);
spin_unlock_irqsave(&mydata_lock);

so if the process_timeout_list_work runs early or twice it doesn't
matter; I just want to make sure that the work runs from the beginning
and sees the new item I added to the list at some point after the
requeue.

> OK, suppose that we s/cancel_delayed_work/requeue_delayed_work/,
> then we seem to have the same deadlock
>
> A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock
> B: holding mad_agent_priv->lock, waiting for requeue_delayed_work()
> which found !timer_pending() && queued work
> C: interrupt during work->func() that takes cm_id_priv->lock

Yes, I agree that if requeue_delayed_work() ever waits then we run into
the same deadlock as before. It only works if requeue_delayed_work() is
the rough equivalent of mod_timer(), which never waits.

> Perhaps, requeue_delayed_work() should cancel the pending work, but do
> not wait_on_work(). This is not trivial, we have to avoid livelocks if
> cancel_work_no_sync() races with queue_work()/etc. Perhaps,
> requeue_delayed_work() could return the error if it can't update the
> timer and can't cancel the work without spinning ?

I guess returning an error is possible ... although I wonder what the
caller would do to handle the error?

Perhaps the semantics are sufficiently fuzzy and not general enough, so
that the best answer is my special-case open coded change for my
specific case. I don't know whether other places would even want a
requeue_delayed_work() ... I simply raise this point because when I find
myself reimplementing the structure of work_struct + timer because
delayed_work API is lacking, then it seems prudent to consider extending
delayed_work API instead.

Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/