Re: RT and Cascade interrupts

From: Ingo Molnar
Date: Fri May 27 2005 - 02:41:40 EST



* john cooper <john.cooper@xxxxxxxxxxx> wrote:

> john cooper wrote:
> >I'm seeing the BUG assert in kernel/timers.c:cascade()
> >kick in (tmp->base is somehow 0) during a test which
> >creates a few tasks of priority higher than ksoftirqd.
> >This race doesn't happen if ksoftirqd's priority is
> >elevated (eg: chrt -f -p 75 2) so the -RT patch might
> >be opening up a window here.
>
> There is a window in rpc_run_timer() which allows
> it to lose track of timer ownership when ksoftirqd
> (and thus itself) are preempted. This doesn't
> immediately cause a problem but does corrupt
> the timer cascade list when the timer struct is
> recycled/requeued. This shows up some time later
> as the list is processed. The failure mode is cascade()
> attempting to percolate a timer with poisoned
> next/prev *s and a NULL base causing the assertion
> BUG(tmp->base != base) to kick in.
>
> The RPC code is attempting to replicate state of
> timer ownership for a given rpc_task via RPC_TASK_HAS_TIMER
> in rpc_task.tk_runstate. Besides not working
> correctly in the case of preemptable context it is
> a replication of state of a timer pending in the
> cascade structure (ie: timer->base). The fix
> changes the RPC code to use timer->base when
> deciding whether an outstanding timer registration
> exists during rpc_task tear down.
>
> Note: this failure occurred in the 40-04 version of
> the patch though it applies to more current versions.
> It was seen when executing stress tests on a number
> of PPC targets running on an NFS mounted root though
> was not observed on a x86 target under similar
> conditions.

should this fix go upstream too?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/