Re: workqueue question.

From: Ben Greear
Date: Thu Jun 30 2011 - 13:18:59 EST


On 06/30/2011 03:00 AM, Tejun Heo wrote:
Hello,

On Wed, Jun 29, 2011 at 09:02:29AM -0700, Ben Greear wrote:
On 06/29/2011 01:43 AM, Tejun Heo wrote:
It appears that the code just wants to (re)add itself to the
work queue with a different callback method:

static void rpc_final_put_task(struct rpc_task *task,
struct workqueue_struct *q)
{
if (q != NULL) {
INIT_WORK(&task->u.tk_work, rpc_async_release);
queue_work(q,&task->u.tk_work);
} else
rpc_free_task(task);
}

Ummm... so, at the time of INIT_WORK(), the tk_work could be already
pending or running?

This method is indirectly called by the worker-thread. The
trace below shows it taking the else branch, but I'm not
sure it always does so.

__slab_free+0x57/0x150
kfree+0x107/0x13a
rpcb_map_release+0x3f/0x44 [sunrpc]
rpc_release_calldata+0x12/0x14 [sunrpc]
rpc_free_task+0x59/0x61 [sunrpc]
rpc_final_put_task+0x82/0x8a [sunrpc]
__rpc_execute+0x23c/0x24b [sunrpc]
rpc_async_schedule+0x10/0x12 [sunrpc]
process_one_work+0x230/0x41d
worker_thread+0x133/0x217
kthread+0x7d/0x85
kernel_thread_helper+0x4/0x10

My debugging leads me to believe that the rpc_async_release
is (very rarely) called on a task object that has already been logically
freed.

What do you mean "logically freed"? Do you mean the rpc_task struct
is freed twice?

Yes it seems so..though it's really just poked back into a mempool
instead of kfreed.


Is there a better way to queue this up that might have less chance
of some strange race?

Why not just use a separate work item?

No idea, this is from existing net/sunrpc/* code. If the
you think that is more proper way to do this logic, I can try that.

Also, is it valid to free the memory containing foo
in a workqueue callback?

Yeap.

Is there a method that can be called from a workqueue callback
to verify that the item has not been re-added to the work-queue?

Can you be a bit more specific? Are you saying that queue_work() and
INIT_WORK() may race?

No, I don't think that is racing. Basically, when I'm about
to logically free (put back into mempool) the task struct, I
would like to add a sanity check to make sure it's not currently
scheduled on a work queue. If it were, that would explain the
backtraces I was seeing from slub memory debugging logic and
I'd be closer to understanding the problem.

I tried doing a cancel, but that caused recursive locking issues.

I'd like to call this right before freeing the object and BUG_ON()
if the object is actually still on on a work-queue.

That may be useful as a debugging feature but is inherently racy.
Nothing guarantees the work item won't be queued after BUG_ON() but
before actual freeing. The guarantee that the work item is no longer
in use should come from the wq user. There are good number of use
cases where work item frees itself or the containing data structure
and they all work fine.

At this point I have no reason to believe the work-queues are buggy,
but due to state machines and callbacks and method pointers, it is
quite difficult to know the method flow in the rpc code. So an
extra sanity check might be quite useful. I'll try to code something
up for the work-queue logic when I get a chance.

Thanks,
Ben


Thanks.



--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/