Re: RT and Cascade interrupts

From: Trond Myklebust
Date: Sun May 29 2005 - 02:43:20 EST


lau den 28.05.2005 Klokka 23:12 (-0400) skreiv john cooper:

> 1. Can you correlate anything of rpc_run_timer() running in
> preemptive context which could explain the above behavior?
>
> 2. Do you agree that (!RPC_TASK_HAS_TIMER && timer->base)
> is an inconsistent state at the time of
> rpc_release_task():rpc_delete_timer() ?

I have yet to see an example of that happening.

The current code should be working according to the following rules:

timers are only set if the rpc_task is being put on a wait queue
and tk_timeout is non-zero.

RPC_TASK_HAS_TIMER is always set immediately before the timer is
activated.

It is cleared either by the handler _after_ it has run (and
after timer->base has been cleared by __rpc_run_timer()), or by
__rpc_execute()/rpc_release_task() immediately prior to the call
to del_singleshot_timer_sync().

task->tk_callback should never be putting a task to sleep, so
there is no timer being set after the call to rpc_delete_timer()
and prior to the call to task->tk_action.

The above rules should suffice to guarantee strict ordering w.r.t.
starting a timer, clearing the same timer, and restarting a new timer,
so the code should not need to be reentrant. I've appended a patch that
should check for strict compliance of the above rules. Could you try it
out and see if it triggers any Oopses?

Cheers,
Trond

sched.c | 8 ++++++++
1 files changed, 8 insertions(+)

Index: linux-2.6.12-rc4/net/sunrpc/sched.c
===================================================================
--- linux-2.6.12-rc4.orig/net/sunrpc/sched.c
+++ linux-2.6.12-rc4/net/sunrpc/sched.c
@@ -135,6 +135,8 @@ __rpc_add_timer(struct rpc_task *task, r
static void
rpc_delete_timer(struct rpc_task *task)
{
+ BUG_ON(test_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate) == 0 &&
+ timer_pending(&task->tk_timer));
if (RPC_IS_QUEUED(task))
return;
if (test_and_clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate)) {
@@ -337,6 +339,8 @@ static void __rpc_sleep_on(struct rpc_wa
void rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task,
rpc_action action, rpc_action timer)
{
+ BUG_ON(test_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate) != 0 ||
+ timer_pending(&task->tk_timer));
/*
* Protect the queue operations.
*/
@@ -594,6 +598,8 @@ static int __rpc_execute(struct rpc_task
unlock_kernel();
}

+ BUG_ON(test_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate) != 0 ||
+ timer_pending(&task->tk_timer));
/*
* Perform the next FSM step.
* tk_action may be NULL when the task has been killed
@@ -925,6 +931,8 @@ fail:

void rpc_run_child(struct rpc_task *task, struct rpc_task *child, rpc_action func)
{
+ BUG_ON(test_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate) != 0 ||
+ timer_pending(&task->tk_timer));
spin_lock_bh(&childq.lock);
/* N.B. Is it possible for the child to have already finished? */
__rpc_sleep_on(&childq, task, func, NULL);