Re: RT and Cascade interrupts

From: john cooper
Date: Tue May 24 2005 - 11:50:32 EST


john cooper wrote:
I'm seeing the BUG assert in kernel/timers.c:cascade()
kick in (tmp->base is somehow 0) during a test which
creates a few tasks of priority higher than ksoftirqd.
This race doesn't happen if ksoftirqd's priority is
elevated (eg: chrt -f -p 75 2) so the -RT patch might
be opening up a window here.

There is a window in rpc_run_timer() which allows
it to lose track of timer ownership when ksoftirqd
(and thus itself) are preempted. This doesn't
immediately cause a problem but does corrupt
the timer cascade list when the timer struct is
recycled/requeued. This shows up some time later
as the list is processed. The failure mode is cascade()
attempting to percolate a timer with poisoned
next/prev *s and a NULL base causing the assertion
BUG(tmp->base != base) to kick in.

The RPC code is attempting to replicate state of
timer ownership for a given rpc_task via RPC_TASK_HAS_TIMER
in rpc_task.tk_runstate. Besides not working
correctly in the case of preemptable context it is
a replication of state of a timer pending in the
cascade structure (ie: timer->base). The fix
changes the RPC code to use timer->base when
deciding whether an outstanding timer registration
exists during rpc_task tear down.

Note: this failure occurred in the 40-04 version of
the patch though it applies to more current versions.
It was seen when executing stress tests on a number
of PPC targets running on an NFS mounted root though
was not observed on a x86 target under similar
conditions.

-john


--
john.cooper@xxxxxxxxxxx
./include/linux/sunrpc/sched.h
./net/sunrpc/sched.c
=================================================================
--- ./include/linux/sunrpc/sched.h.ORG 2005-05-24 10:29:24.000000000 -0400
+++ ./include/linux/sunrpc/sched.h 2005-05-24 10:47:56.000000000 -0400
@@ -142,7 +142,6 @@ typedef void (*rpc_action)(struct rpc_
#define RPC_TASK_RUNNING 0
#define RPC_TASK_QUEUED 1
#define RPC_TASK_WAKEUP 2
-#define RPC_TASK_HAS_TIMER 3

#define RPC_IS_RUNNING(t) (test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
#define rpc_set_running(t) (set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate))
=================================================================
--- ./net/sunrpc/sched.c.ORG 2005-05-24 10:29:52.000000000 -0400
+++ ./net/sunrpc/sched.c 2005-05-24 11:02:44.000000000 -0400
@@ -103,9 +103,6 @@ static void rpc_run_timer(struct rpc_tas
dprintk("RPC: %4d running timer\n", task->tk_pid);
callback(task);
}
- smp_mb__before_clear_bit();
- clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate);
- smp_mb__after_clear_bit();
}

/*
@@ -124,7 +121,6 @@ __rpc_add_timer(struct rpc_task *task, r
task->tk_timeout_fn = timer;
else
task->tk_timeout_fn = __rpc_default_timer;
- set_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate);
mod_timer(&task->tk_timer, jiffies + task->tk_timeout);
}

@@ -135,7 +131,7 @@ __rpc_add_timer(struct rpc_task *task, r
static inline void
rpc_delete_timer(struct rpc_task *task)
{
- if (test_and_clear_bit(RPC_TASK_HAS_TIMER, &task->tk_runstate)) {
+ if (task->tk_timer.base) {
del_singleshot_timer_sync(&task->tk_timer);
dprintk("RPC: %4d deleting timer\n", task->tk_pid);
}